While researching an article requiring detailed employment and hiring data in the software industry, I ran across Indeed.com. For those not yet familiar with Indeed, it is a Google-style search engine for job postings. Type your skills in the search box and get back a list of job postings along with breakdowns by location, job title and estimated salary. Then click over to the “salaries” or “trends” tabs to see a history of the salaries or number of jobs for your skill set over the past couple years.
Fantastic, I thought. A few quick searches and I’ll have just the breakdowns I need to see which technologies dominate, which ones are gaining market share, and which ones are slipping away. In particular, I’m interested in the balance between Java and .Net and how the different web and Rich Internet Platforms compare to each other.
Indeed provided just the data I needed along the left-hand side of any search. Breakdowns showing exactly how many listings were in each salary range, location, etc. I got straight to work tuning my searches, copying data into a spreadsheet and running some quick calculations against the data.
Then the house of cards came tumbling down. I clicked on one of the job titles. The statistics on the side showed 20 job listings for an “Enterprise/SOA Architect” position. Upon viewing the search, only one listing appeared. The other 19 were removed because they were duplicate listings. I clicked another job title and another, finally compiling a list of the counts for the category I was looking at. Out of 173 listings, only 19 of those listings were unique listings. The rest were all duplicates. For every 1 real listing, there were 9 duplicates.
I started doing other searches and doing spot checks. Ratios of 8-10 duplicate listings for every 1 real listing were not uncommon, with one listing having 22 duplicate listings.
Then there were the listings Indeed couldn’t recognize as duplicates. This drove the ratio of duplicate to real listings even higher.
In the end, I’ve abandoned any idea of using Indeed for job or salary trends. With over 90% of the listings being duplicates, the data Indeed presents is too dirty to be useful for any trending or analysis whatsoever. Not only do duplicate listings skew the data, but the variances in the ratios of duplicate listings to real listings make it impossible to normalize the data.
To truly understand the current job market, Indeed must remove the duplicates from all places on the site, not just the final search results. Until that happens, steer clear of using any of the trending features of Indeed, whether it be their Salaries or Trends pages, or the summary statistics on the search results pages.
Sidebar: Improving Indeed
All web sites have an inherit conflict of interest in accurately representing their numbers. In marketing, the bigger the numbers the better. It’s a little known fact that many of these numbers are orders of magnitude smaller than what is publicly stated. A web site with 5 million members usually only has 250,000 to 500,000 active members (ie: those who’ve logged in in the past 30 or 90 days).
Yet those who rise to the challenge and clean up their numbers can gain competitive advantage, if they work it right. For Indeed, cleaning up their numbers would enable them to develop new revenue streams and greater competitive advantage by providing real-time intelligence on the current job market.
Imagine you’re an HR manager at a small company. The kind that uses Salary.com to determine the salary range for a new position. Unfortunately, Salary.com uses broad-based categories, like Programmer Level III or Staff Nurse, which require broad salary ranges and don’t map accurately onto the job requirements and the market value of the individual skills needed. So you, as HR manager, are left guessing what the real value of the position is.
Now imagine you can saunter over to Indeed, provide the list of job requirements and have Indeed tell you the appropriate current market salary for the set of skills you are looking at. Indeed charges you a small monthly fee for this service.
Or if you just got laid off from your job and you are wondering what the most valuable skills for you to learn would be. Indeed could provide a page which lists all the skills in your field, ordered by the skills that would have the most impact on your salary. Using more sophisticated analysis, they could even tell you how many years of experience in each skill is required for the most impact.
Finally, Indeed could provide a map of the entire job market, showing which areas were hot, which ones were in decline, or create a customized map for you based on your skills that shows you exactly which jobs you are most qualified for. On the HR side, skill maps could help proactively identify shortages for the skills with the highest strategic impact on a business, enabling companies to hire employees before the market dries up.
Indeed could provide these services by applying well known analysis techniques that perform component and regression analysis on the job listings combined with human-guided automated learning and weighted moving average techniques that smooth out results and deal with anomalies. Since they already appear to be using some variation of a market basket analysis for salary estimation, this shouldn’t be a large leap.
The market imperative for Indeed, or a future competitor, to move beyond job search into hiring and career analysis, will only grow over time. And the sooner Indeed starts, the sooner it can iron out the wrinkles and get a leg up on the competition.