Data scientists are some of the smartest people alive and hard to find. This guide helps you to find, assess and hire them.
Data scientists are increasingly sought after by companies. Especially tech companies employ data scientists to get the right data and make sense out of data. Some companies employ data scientists to improve the product they are offering and some companies need data scientists to fuel the organization’s data driven decision making process.
The role of data scientist is one where many different qualities come together. Every data scientist has to understand what data has to lead up to (the desired business outcome), use complex statistical concepts to create valid insights and use the right tools and code to extract, scrub and analyse data.
The core skills of a data scientist find their origins in domain knowledge, statistics and programming.
Domain knowledge
The ideal data scientist has a good understanding of the desired business outcomes. In every company the data they work with is different. A data scientist should understand the features of the data for the company in question in order to handle the data correctly and deploy the right algorithms.
Statistics
Data scientists need a good understanding of statistics. The models they deploy are based on math and specifically statistical concepts. Concepts and approaches like linear regression, the bell curve, central tendency, variability, variance and standard deviation should be a piece of cake for them.
Programming
Getting data, data scrubbing and data analysis requires the use of the right tools, Application Programming Interfaces (API’s) and in many cases custom code. In data science code is written in for example Python or R and has to integrate with back-end code that is written in other languages. Other tech skills include:
Data science is a broad domain and there are many things to master within the field. Therefore in many organisations, especially larger organisations, there are usually specialized roles within a data science team that collectively work towards a shared goal.
These are the basic types of data scientists that companies need:
Data engineers are focussed on getting and preparing data. The goal of the data engineer is to prepare data so it can be used for further analysis and decision making. The most important activities to achieve this are data extraction, data consolidation and data cleansing (data scrubbing).
Data researchers are focussed on finding patterns in data, providing insights from the data to their team or customers and build analytics solutions so data rookies can use the insights derived from data.
Machine Learning experts are specialised in learning models and algorithms. They research, build and test learning algorithms that are deployed in self learning products or for organisational purposes.
Next to the above mentioned roles there can be specializations like the data quality engineer, database administrator, data modeler, BI engineer or data architect.
Data talent can be found across a variety of sources. Many recruiters would start their search on LinkedIn but there are niche platforms that match a lot better with the sought after data talent pool.
Kaggle is an online community for data scientists and machine learning experts and enthusiasts where Kagglers participate in data science challenges. On the platform users also share data sets, collaborate on code and solve data science challenges. Companies can post their challenges to Kaggle so users can choose to compete in the data challenges and have the opportunity to win prize money.
With 5 million data scientists and machine learning experts, Kaggle is the go to source for finding data talent.
Kaggler profiles are very rich in relevant information about skills and activity with particular technologies, libraries and frameworks used.
Here’s how to source Kaggle.
Stack Overflow is a question and answer website for engineers. Users can earn reputation points and "badges" by providing valuable answers. Next to the reputation of individual engineers you can also find a lot of information about the most recent technologies they have been working with.
Most of the information like top technology tags, reputation, badges and scores are based on actual activity rather than own input which makes the information very reliable from a sourcing perspective.
With 14 million engineers and rich technology skills information based on Q&A's, Stack Overflow is a must for sourcing data talent.
Here’s how to source Stack Overflow.
GitHub is a code repository and version control platform, fuelled by the functionality of Git, plus additional features. GitHub accounts are free and are frequently used to host open-source projects where engineers deposit their repositories.
The benefit of sourcing on GitHub is that the information on talents is very up to date and relevant to their technical skills. If you are willing to take some time to research candidates on, you start to see which users are active and developing and sharing relevant code.
With 65 million users GitHub is the platform with the most active engineering users, beating LinkedIn and any other platform.
Here’s how to source Stack Overflow.
LinkedIn is the most actively used professional platform in the world. Many recruiters rely on LinkedIn as their single source of candidates. Even though it has a lot of users, recruiters might not find their desired data talent here because there are not a lot of data science candidates that have complete and up to date profiles. In addition to that, competition is fierce on LinkedIn. That said, LinkedIn can still be a good source to include in your sourcing channels.
If you don’t have a LinkedIn premium account or LinkedIn Recruiter seat you can learn here to source LinkedIn without premium features.
Talent networks are platforms with vetted talent that primarily provide candidates for contract positions. Mant talent networks provide data scientists, AI experts and machine learning engineers. Talent networks typically charge a 15 - 25% surcharge for any contracted candidate.
Some examples of talent networks:
Medium is an online publishing platform for social journalism with bloggers sharing their views and knowledge. There is a great variety of topics covered by the bloggers on Medium and you can also find data talent that is sharing blogs about technical or more abstract topics.
The information on Medium is rich because keywords can be found in the content of the articles that are written.
Here’s how to source Medium.
Reddit is an overlooked platform for sourcing talent but with 330 million users it is one of the biggest online communities existing today. Redditors engage in all kinds of comical discussions but also data talent can be found discussing data in subreddits like these:
Here’s how to source Reddit.
Every specialization and seniority level requires different knowledge and skills. The assessment should be aligned with what candidates will do in their daily job.
There are several ways to assess the knowledge and skills of a candidate but the ones that are most frequently used are interview questions and data skills assessments.
It helps to ask the candidate questions targeted at their knowledge and especially experience with fundamental data science concepts and principles. By asking for how they dealt with certain problems before or how they would deal with a problem if they haven’t encountered it yet, you can assess their ability to understand what is asked, their reasoning logic to come up with a solution and to what extent they are able to clearly communicate a proposed solution.
Examples of questions to ask a data scientist interviewee:
Next to having a free format assessment you can use standardized assessments that assess core skills. These data skill assessments are usually quite general in the sense that they assess skills that any data scientist uses.
Examples of standard data skills tests:
You can combine these standard tests with your own assessments that are more targeted at the specific skills the candidate needs int he context of the given role and company. You can give the candidates home assignments or invite them in the office to work together for part of the day.
Be very clear about the prospective responsibilities of the data scientist. Will their focus be on data engineering, analysis or a specific domain like Machine Learning?
The average salary of a data scientist in the United States is $138,596 per year (Glassdoor data).
Take into consideration that these are averages and the actual salary to offer a candidate is very dependent on a lot of variables like the seniority and experience of the candidate, geography of the job, workload and other benefits offered.
Data scientists are hard to find because the demand for this talent is a lot higher than the supply. Determine the unique offer you can make the candidate. Be clear about the work that the candidate would do and the value of data for the company.
Get qualified and interested candidates in your mailbox with zero effort.