Specialist Guide to Hiring a Top Data Engineer

With precision hiring intelligence™ at the core and humans at the helm, to enable founders, CEO’s, boards, investors and technology leaders to build world-class teams.

Maxime Beauchemin is a name that needs no introduction in the world of data engineering. He is currently the CEO and co-founder of Preset, an AI enabled data visualization for modern companies. He was one of the first data engineers at both Facebook and Airbnb, and was responsible for writing and open sourcing Apache Airflow.

Maxime been influential in the development of many of the most impactful data engineering technologies of the last decade. In the seminal 2017 blog post “The Rise of the Data Engineer” he argued the need for a specialized engineer to manage ETL, build pipelines and scale data infrastructure.

According to Maxime – Data engineers were once responsible for managing the compute and storage. However, their role is changing. They are becoming more and more responsible for ensuring data performance and reliability, as well as educating data teams on best practices. This includes ensuring efficiency, data modeling, and coding standards.

It is now widely regarded that there are 3 types of Data Engineer:

Type 1 >> the Database/ Data Warehouse/ Dashboard engineer  – This category tends to be where most, start their journey in becoming a well rounded data engineer. They have experience in relational databases and data warehousing. Knowing about structuring databases, constraints, indexing, query tuning and performance, data quality, data management, data warehousing, these engineers are tasked with producing dashboards, talking with end users, and providing solutions to businesses by analysing data.

Type 2>>Category 1 + Python + Airflow/dbt (etc) – A data engineer is a highly skilled professional who creates pipelines and systems that help organizations collect, transform, and analyze data at massive scales. They’ll use tools like Python to build libraries like pandas, numpy, and PySpark. Not only can these engineers write beautiful OOP classes, they are also adept at working with unstructured and structured data from any source. They start to get good at DevOps and focus on building out automated end-to-end systems that move and transform data with the help of Airflow or dbt.

Type 3>> Category 2 + Scala/Java + Distributed Systems + Architecture + Big Data + ML – Data engineers have to be really good at distributed systems such as Kubernetes, Spark, and Pulsar. They have to understand the size and complication of the data that is being used. It’s not just about knowing Python, sometimes they have to debug code with specialized languages such as Java or Scala. They are considered to be having senior level architecture skills. They know how to process big data clusters and are aware of a lot of challenges. For them it’s not just about how to orchestrate a complicated pipeline, but also how to stitch together and work with peta-byte level data in their everyday lives. They know MLOps.

The different categories of data within an organization is what determines the category of data engineer you need. Data Engineering team with people that excel in each of those categories is the dream team that any company would want to start and expand with.

Five reasons why leveraging precision hiring intelligence™ to recruit a top data engineer can give you an unfair advantage

According to scientific research, when companies leverage precision hiring intelligence, they are able to achieve the following:

  1. Precision Hiring – Through leveraging Machine Learning and Deep Learning the chances of precision hiring with quality data engineering candidates are greatly enhanced. When there is a human overlay on top of this matching engine the precision increases even further still and you are able to access 12.5X as many high-quality candidates.
  2. Faster – It is estimated that the top 20% of data engineering talent are usually on the job market for as little as 22 days which is compounded by the data engineering talent pool in New Zealand and Australia being split 50:50 between employment type being either permanent or fixed term employment. By using technology to search rather than manual human labour you are able to recruit in days, rather than weeks or months.
  3. Reduced Cost –  When you have the ability to hire faster and with less effort and pain you significantly drive cost per hire down. Not only is the cost of acquiring a new data engineer lower but the cost of late placement on important projects that could provide a competitive edge to your competitors – who are also likely to be the ones hiring the candidate that you have just missed out on hiring! Or worse still if you get your hiring and assessment process wrong and hire a bad candidate the cost to you of this bad hire can often be 2X to 3X of their first annual salary.
  4. Less sourcing hours per role filled – It is estimated that the traditional do it yourself approach to hiring data engineers can start trending northwards from $16,000+.
  5. 59% less turnover – It is not just attracting top data engineers but you also need to retain the top performers who are also more likely to stay longer with a company and role that is in sync with developing and investing in their talents.
How to make targeting the top candidates work for you?

There are some key things you need to consider when trying to attract the top data engineers to your current recruitment process;

  • Your target strategy is important – Some of the key questions are: Which of the 3 types of Data Engineer are we targeting? Which sectors do we need to target? Which pool of companies and candidates should we tap into? What are some lateral sector and/or company and/or candidate types we may not have considered?
  • Your data sources are important -The overwhelming volume of candidate data being produced is growing exponentially every day faster than ever. At the moment only 5% of all data is ever used. The next step is ensuring that are accessing the right data sources are critical to help you separate the signal from the noise.
  • Increasing the sample size of potential candidates for your roles –  Once the right data sources are locked in you then need to make sure that you build your datasets for each role to represent the full universe of potential on-market (active) and off-market (passive) candidates. If you only have 30 to 50% of the total talent pool mapped then how do you even know if you are tapping into the top performers?
  • The right messaging channels is your superior access vehicle – Insights into the full universe of potential candidates is not enough.  Then right messaging channels are critical and provide a superior access vehicle much like a trojan horse, so it is probably best not to rely on a single LinkedIn ‘Recruiter’ InMail which is likely to get very low single digit response rates at this level. In 2023 – and beyond – optimal engagement occurs through the right messaging across a mix of Platform/Email/Mobile and of equal importance for response rates the sequencing and timing of these. As a marker if you are doing this right you should be getting a target candidate messaging open rate of 80%+.
  • Integrate a pre interview skills proficiency assessment into your process – Peter Cappelli Professor of Management at The Wharton School and Director of Wharton’s Center for Human Resources who published an article called “Your Approach to Hiring Is All Wrong” recommends integrating a pre interview skills proficiency assessment into your recruitment process.
How do I get started?

There are three options:

  1. Do it yourself – Do you do it yourself the traditional old fashioned way which still requires a high degree of manual labour intensive work. The standard opportunity cost per hire of this approach is usually $16,000+ [opportunity cost i.e. daily revenue per employee X average days positions unfilled = total revenue cost per unfilled job] with 5 to 7 data sources and an average time-to-hire of 63 days.
  2. Use a traditional recruitment agency – Do you outsource this to a recruitment agency where the average cost per hire for a $150,000 role is $27,000+ [opportunity cost i.e. daily revenue per employee X average days positions unfilled = total revenue cost per unfilled job] with 5 to 7 data sources and an average time-to-hire of 31 days.
  3. Use scaleXT – Or you may want to consider tapping into scaleXT which has: 67 data sources and datasets of 95.7% of data engineering talent mapped to help many of New Zealand and Australia’s leading tech companies access the top 20% of potential candidates and an average time-to-hire of 18 days.
Next steps …

Now you have a clearer understanding of the data engineering landscape across New Zealand and Australia and afar … and have reflected back on the target you want your arrow to hit, including the role they’ll play in shaping your future.

You’re now in a position to develop a plan, either by yourself or with an executive search professional or scaleXT.

We’d like your NEXT STEP to be with scaleXT so you would like to set up a deep dive 60 minute strategy session so you can explore how they can help you get what you want, quickly and effectively please let us know.


How does scaleXT work? Launched with precision hiring intelligence™ at the core powered by humans at the helm our AIR Platform™ enables founders, CEOs, boards, investors and technology leaders to build world-class teams.

Why should I care? Now you can tap into best-in-class technology to do a lot of the heavy lifting sourcing work so you can focus on more higher ROI activities like candidate experience, screening and on-boarding. Recruitment is on the cusp of becoming hi-tech industry in a similar way to what we have seen in other areas like sales and marketing automation with the new Salesforce.com AI product Einstein. Early adopters will gain an unfair advantage over their competition – plus you will most likely be able to leave the office at 5:00pm most days and have a better work life balance!

How fast is scaleXT? Typically you will get a high-quality Candidate Data MAP™ within 24 Hours and a shortlist Interested CV’s to your INBOX™ or ATS within days, rather than weeks or months.

What type of roles work best? The roles that make the biggest impact are: Chief Technology Officer, Chief Technology & Product Officer, Chief Product & Technology Officer, Interim CTO, Chief Transformation Officer, Chief Digital Officer, Chief Data Officer and then building out the organisation chart for business owners, private equity firms and technology leaders across developers, architects, product, design, data analytics, data engineering, data science, programme management, project management hires – amongst others.

Is scaleXT able to help us unlock the potential of Generative AI (GenAI)? For those companies looking to unlock the potential of GenAI scaleXT is able to help you assemble a team of world class strategists, data scientists, ML engineers, and generative AI experts to help you rapidly evaluate and bring high ROI generative use cases to life. Or we can connect you with GPT strategists, researchers and ML product managers who can help you identify other companies to partner with to get the desired results faster.

How do you get Interested CV’s to your INBOX™ or ATS within days, rather than weeks or months? We use a proprietary algorithm called Enigma that processes Natural Language using Deep Learning Neural Networks – which has processed millions of candidate profiles in millions of real world job scenarios – that it is able to replicate the way that the human brain works to improve decision making at a scale previously not possible. Once the matching engine identifies precise matches our platform has personalised interactions with the right candidate through the right channel at the right time. From here our dedicated team of humans at the helm at scaleXT take it from there to increase the precision matching even further still!

What channels does scaleXT use to interact with candidates? scaleXT interacts with candidates through 3 channels (i.e. Platform/Email/Mobile) and will use the channel(s) in which candidates are most likely to be engaged on – as well as using data science to allow right sequence of interactions and right follow-up prompts occur at the right time.

Are humans involved? Yes this is critical. We have a high touch Customer Success Manager overseeing each recruitment process so there is a blend of best-in-class technology and machine learning at the core with humans at the helm. This allows us to automate a huge amount of the heavy lifting part of the sourcing recruitment process so that you can put more human touch into the areas that really matter!

Is there a guarantee for your Candidate Data MAP™ and Interested CV’s to your INBOX™? Yes. If you are not fully satisfied there is a full 100% money back guarantee.

If I join scaleXT, what kind of time commitment am I looking at? We recommend that you only come to scaleXT for urgent critical hires that need to be done at high speed. You will still need to time box appropriate slots of time to move the candidates quickly through the recruitment process.

How do I join the scaleXT Launch programme? Typically joining is by invitation only or through a referral from one of our members. Our next intake for our Launch programme will be in August 2023. Once you have applied to be accepted on the Launch programme and are accepted you will be able to activate job requirements immediately.

Related Posts

Leave a comment

The World’s First AI Recruiting Specialist.