How to catch a shiny pokemon, or Interviewing Junior Data Scientists - rising stars are easy to overlook
I’ve worked for quite a few companies
Now, interviewing to assemble dream team
Disclaimer
Personal experience and views; I hire a specific type of people I find easy to work with We excel at what we do - but we are not the only ones who excel I honestly believe that these rules work - but agree that they are not for everyone. Big companies are built on the work of regular ‘soldiers’, not ‘superstars’
Interviewed a person; little brother ‘shadowed’ it (to see how it looks; mentally prepare) I was preparing for a bloodbath. Interviewee raised a lot of red flags. I asked a bunch of basic questions (a good example: ‘what’s so gradient in gradient boosting models’), he gave me precise and very good answers.
I dug deeper
Interviewee shared his experience, story about project he worked on. He included minute and interesting details, which I appreciated. He mentioned a few technologies/methods I was not aware about; I have written them down.
It was like a bunch of friends meeting after a long break.
Attitude
If something did not work from the first try, and you spent a ton of time to make it work - it’s a good story! Even if was because you forgot something; or even you abandoned the idea altogether. I can see that you learned from this experience, and even learn something myself.
Interviewing is a game for two. (company picks employee, BUT visa-versa is true)
Both must be honest and upfront to get the best from the interview.
Interviewer must have ‘I wish best for the interviewee’ attitude. Not ‘I want to hire the best person for the money given.’ Not ‘I want to check if this person fits the requirements.’
You should hire if you think it’s best for the interviewee to work there. (and for the company to have him).
If it’s not a good company - why are you working for it? You are 1) compromising yourself; and 2)
If he lacks a solid foundation skills - say that upfront.
If he is enthusiastic but overqualified - say that upfront (he won’t stay). Maybe (if company/hiring policies allow) recommend to another position/company. Skillful and passionate people are hard to come by.
If he is qualified and ‘fits the bill’, but something does not zink - say that upfront. Take your time - Hire slow, fire fast.
What’s up with juniors?
- They are hard to interview
- They can lift immense weights
- They are ambitious = carry long-term projects
Questions
1. Tell me about yourself!
Describe yourself, in a couple of so sentences.
Juniors of all disciplines are bad at writing resumes (surprise-surprise!). Because of that, the interviewee’s ‘origin story’ may .
Why this question is important: it may be your tenth interview just this week, it may be their first. They are anxious and nervous, which can impede a good interview and skew results, which may make you miss a good hir.
Beginning with a series of simple, personal questions sets the friendly note. Please, don’t “HOW TO CALCULATE CONVOLUTION PERCEPTIVE FIELD” as a first question.
2. Show me what you are proud of!
Pick a cool (data-science related) project you did. It may be a project from your previous job, an assignment in the university or even your own side project.
This question also belongs to the ‘ice-breaker’ category, but it also measures level of passion towards the craft. The answer hints candidate’s motivation and general attitude towards the field (of data science).
People who are attracted to the data science (or any overcrowded domain, really), do it for two reasons:
- money and clout, or
- curiosity and chance to solve some difficult problems
You are looking for the latter, and this category of people usually have no trouble remembering their participation in a competition, hackathon, or some event. They participate a lot.
By the way: for that reason, I personally value “self-initiated” project more than, for example, a Kaggle bronze. Doing “your own” project involves choosing a problem to solve, properly framing it, picking right instruments and working with available (often limited) resources (such as datasets).
Kaggle provides semi-ready solutions (community notebooks), clean and rich dataset, and, worst of all, a well-posed problem. What an egregious misrepresentation of reality!
3. What was so cool / hard about this project?
What you struggled with? What would you do different now? Do you know of any alternative solutions? What are the pros and cons?
If the candidate has difficulty answering last two questions (*), try steering dialogue into “hardest/largest project you’ve done” direction. A short story about struggling with assignment/homework can too tell a lot about interviewee autonomy and problem-solving skills.
(*) That’s totally okay! not everyone knows about competitions, or has enough time to participate. Students are busy too.
4. Which models you used, and why?
Most likely, the reasoning of model choice will be “because boss told so”, “it was like that before I joined” or ‘everyone else does it like that”. All these reasons are perfectly fine, but they don’t tell you much.
To be honest, this question is just a bridge to the second, more technical part. By this point, the candidate should calm down and become comfortable.
Before they get too comfortable, let’s hit them with that:
5. What’s your favourite Machine Learning Model?
Which one you know the best? Pick one model you are the most comfortable with!
Common choices are linear models, decision trees, and (rarely) gradient boosted trees. Choices like “neural networks”, “BERT”s and GPTs indicate interviewee’s low standards of “knowing” stuff. So, I prefer not to nudge or hint towards any specific models.
No bullshit like “there was other guy responsible for it” can cover lackluster knowledge of “your favourite model”. If it’s your favourite model, you are supposed to know most about it! Interviewee picked it themselves, so no excuses.
Because of that, I prefer this question over asking about technical details of past projects.
(I see sometimes smiles here: they realize what’s the next question)
6. Tell me everything you know about it
Every formula, equation, intuition, best practice, “under the hoods”, implementation details - everything
Just assess the precision of the language, quantity of details and overall completeness of the answer.
I’m almost certain that hiring decisions can be based on just how long the candidate can keep talking non-stop.
7. (Guide a bit)
Where is gradient in gradient boosted trees? When we can’t use linear model? Do tree-building algorithms produce globally optimal trees?
…But, listing all details from the top of the hat is hard. And it’s okay.
Throw some ‘key phrases’ and ‘keywords’ which may trigger association lectures or papers. For instance, mentioning a word ‘regularization’ elicit new bits of knowledge about linear regressions
- for some reason, everyone always forget this topic.
Also: IMO it’s best to issue some explanations/answers if a ‘hint’ does not land. These questions have a ‘correct answer’, unlike the ones before; so just tell it. The interviewee will know a bit more, instead of leaving with a sour taste in the mouth. We are all learners here.
8. Simulated problem solving
Time to give the interviewee break. Describe details of your business and your product. Then, pick and describe a product part/a problem from and just ask:
How would you solve it?
If the candidate is good, you may even play this ‘product development simulation’ game! Their answer is likely to contain something your team already tried, so you know some nuances of practical implementation of the solution. Something won’t work, won’t fit into memory, the data is too dirty, etc. Throw this knowledge back and ask for next steps, workarounds or alternative solutions.
9. Run checklist
Which frameworks and libraries do you know? Which ones would you use?
And ask a few short questions per library/technology. Learning new stuff is not a problem; but already available knowledge of your stack is a small
But - the interview is over.
By now, it should be clear whether the candidate is enthusiastic problem-solving rising star, or just a guy who expects six-digits after Alt+Enter’ing through a Titanic solution he found somewhere.
I hope I’ve highlighted distinguishing features of excellent candidates, provided reasons why it may be so easy to miss them, and strategies how to avoid that.
Have a good hire!