Four reasons why the data science interview is broken

wunengzi
4 min readDec 7, 2020

Scratch any data scientist and they’ll probably tell you they dislike the interviewing process. Still in its relative infancy, there is a huge variation in how interviews go and here are four reasons that make one a horror show.

1.The long take home assignment

‘We don’t want to set hard guidelines on how long a candidate should spend on the assignment, but history shows that it will typically take 20 hours to produce a solution we deem satisfactory’. Paraphrased from a Lead Data Scientist position with a ride sharing firm.

‘Your solution should include exploratory data analysis (including visualizations), preprocessing, a model and analysis of the results. Explain what you did, why you did it and how you would communicate your results to business’. For a Senior Data Scientist position with an insurance firm.

Who do these firms think are going to spend that time on these positions? Maybe current students or someone underutilized at work. I didn’t see the need to commit so much time, so I copy-pasted codes that I’ve used before, changed the parameters to suit the use case and put forward a basic solution. Nothing fancy there, which of course led to not getting a call back.

It does make me wonder who progressed, whether they really put the pedal to metal and ground out a higher AUC. Or did they use some sophisticated, fancy technique that was sexy enough for the interviewers — which leads me to my next point.

2.Data Science is too vast

Having interviewed candidates before, I noticed that how much depth we can go into depends on the intersection of our expertise. I’m heavy on maths and stats, so I can’t go into as much detail for CS things. Similarly I’ve had CS people talk math with me and it generally doesn’t go so well.

The problem is when interviewers/interviewees aren’t aware of this and forcefully pushes people into their sphere. I did have an interview with a highly credentialed head of DS — he was fixated on Random Forests and kept throwing out questions on it. What does this parameter do? What is random about it? I prefer XGB and communicated it — happy to go into depth with it, but the questions kept coming. Until I finally asked, ‘In your experience has there ever been a time Random Forest was meaningfully better than XGB?’ Radio silence ensues — I didn’t get a second round.

Data science is vast and most people have a specific niche they like to dive into. For any practitioner worth her salt she’ll be studying on her own too, taking snippets here and there to deliver value. Is there any DS that doesn’t use stack exchange? Supposedly so….

3.Online assessment — but we record your screen and you need to write code organically!

There was this company that asked you to take an online assessment test but imposed the restriction of needing to write code organically, and no googling around. Another interviewer asked for a program to automatically analyze data, but code it in real time while he watches. This doesn’t mimic the workplace at all, but I think it’s fair for simple things like simple manipulation with pandas or basic programming, e.g. for loops.

4.Hiring managers that have no DS experience

Many heads of data science have not worked a single day as a data scientist. So when they’re assigned this role, their first order of the day would be to actually get a data scientist… but how do you do that? Well, either by using credentials (he has a Masters in data science so he must know his stuff) or by setting questions that has nothing to do with a typical day-to-day, like hackerrank style CS algorithm questions. Since FAANG is doing it, it only makes sense that we copycat them.

Ironic that the experienced data scientists tend to not have data science specific credentials as the pioneers self study their way into it. And how relevant is a credential, when most of the things you learn aren’t there after a few years? This field changes extremely quickly. Tensorflow is 5 years old. Transformers just 3.

By the way, has there ever been any studies on the correlation between being able to solve algorithm questions with on the job performance? I suppose that when you have so many applicants, filtering doesn’t really affect you, but it does reveal a strong preference for CS people that leads to less diversity. Who said data science was the intersection of maths, stats and CS?

It makes one appreciate how subjective, and luck based an interview can be. For even what I consider fair, e.g. for loops, I have a colleague who doesn’t think so. Indeed, wherever we go, we always need a healthy dose of luck… or a lot of time and hard work to game the interview. I won’t take away anything from those that study/put in the effort to beat it. Hate the game, not the player.

--

--