Data Science Q&A with Chief Data Science Mentor Manvender Singh

This article is Part 2 of a series of interviews with INSAID Chief Data Science Mentor, Manvender Singh.

In continuation of the previous part, here are the remaining questions of the Q&A Series with our Chief Data Science Mentor, Manvender Singh. Once you are done reading this, we’re sure all your queries would’ve been answered. In case some still remain, do not hesitate to write to us in the comment box below.

Ankita: With more and more organizations adopting Data Science, startups as well as big names like Google or IBM are hiring Data Scientists. What difference can a Data Science job aspirant expect in these two different types of organizations?

Manav: Data Science is such a vast field that both the small companies, as well as large established organizations are hiring for people in this space. The simple difference between the kind of roles in the two types of setups is that in large established organizations, usually there is an established practice.

What does this mean for the Data Scientists joining these two types of organizations? For example, if they join the Data Science team of a big name like IBM, there would already be a team of Data Scientists working there and they would already have a hierarchy, i.e., there would be Junior Data Scientists, Senior Data Scientists, Principal Data Scientists, Head of Data Science, etc.

You will experience four major differences there.

– Number one, your role will be more concretely defined.
– Number two, the problem statements that you’re going after, will also be more concretely defined.
– Number three is that most likely the data is already in place to some extent. For example, if IBM is working on a project, most likely the Data Scientists who are already there in the company would know what are the kinds of challenges in collecting data and what kind of data is required for different projects and they would have already established some kind of framework.
– Number four is that most likely all these established organizations would have a good sense of the tools that they are using. For example, they might be using Python for Data Science on a local system or cloud or AWS.

They might be working with Data Science on Google Cloud or Azure. So, they would have already chosen the tools that they will be working with, and most likely you will be expected, not definitely required, but expected to work on one of these tools.

But as a Data Scientist in a startup, what you would essentially be required to do is most likely build a data pipeline from zero. There would be no hierarchy; you’d be the first person to be hired.

Most likely the stakeholders, the top management of the company, and all would be looking at Data Science from a single problem point of view; they might have encountered one problem and they would be like, let me just hire a Data Scientist to solve this kind of problem. They might not have a view of what are the different ways in which Data Science can be used because they are not the experts.

So that’s where the startup Data Scientist needs to have a very different kind of mindset. He or she needs to understand that he or she is joining the startup and that’s why the kind of problems they would have might not be very concretely defined.

Every decision that they will be taking will be a new decision. They need to be more flexible with a problem statement and more flexible with ambiguity in startups.

That’s essentially the difference between large company Data Scientists versus startup Data Scientists.

Ankita: Okay, so when a person takes interest in making a career in this field, to be a Data Scientist, he or she perceives it to be a longer process, which might not be the reality. They think that they have to undergo multiple rounds of studies, certifications and all these things. What exactly is the time duration? How long does it take to become a Data Scientist?

Manav: So different people, in my experience, take different amounts of time, depending on what their previous background has been.

Let’s say that someone from a non-programming background wants to enter Data Science, will slightly take a little more time than a person who already has some familiarity with a tool like Python or already knows Java or something.

And for that matter, someone who has done his or her master’s in maths, might have an unfair advantage and they won’t struggle with maths as much as people who are not from maths or statistics background.

But having said that I typically ask students to target mastering Data Science in 6-9 months time period. That for me is the sweet spot. The simple reason is that if we expect to become a Data Scientist in, say, 3 months, it’s very unrealistic; though it is possible, but it’s a little bit of a stretch. And if you are targeting more than one year, it’s very, very difficult to maintain momentum. At the end of the day, you are learning multiple things and you want to be focused.

So, 6-9 months is a good amount of time when you would have learnt one tool like Python, mastered data analysis, worked on a couple of projects, and completed machine learning. It will also be a good time for you to brush up all that is required to make a successful transition happen, like building your resume and building your portfolio, etc. that will help you get a Data Science job.

Ankita: Now, when we have talked about how long it takes to become a Data Scientist, would you like to share some common reasons why a job application for a Data Scientist is rejected?

Manav: Excellent. That’s a very good question. So, I will list the five most common reasons that I have come across why applicants are rejected when they are applying for data science roles.

The first reason is, no relevance to the job that they’re applying for, i.e. the role of a Data Scientist. Let’s say that a person is currently in the role of testing Oracle or BI and they have done a certification program or a course. But when you go through their resume, the certification or course is just one line in the entire resume.

This is the surest way for me to understand, as a recruiter that this person does not have the depth or understanding that is required to become a successful Data Scientist. This is not matching your current skill set to the role of a Data Scientist.

The second reason why a resume is rejected is not mentioning your projects. At the end of the day, as a recruiter, I can trust you whether you’ll be able to become successful in a Data Science role, only if I see some good projects. So, you need to mention your projects and that’s something a lot of people don’t do, which becomes the reason for rejection.

The third reason for people to get rejected is keyword stuffing. They put so many data science terms and terminologies like machine learning, deep learning, TensorFlow, which is called keyword stuffing, just to get their resume selected.

The very moment I look at a resume like this, I can instantly see that this person has not done real Data Science; all this person has done is just putting a lot of buzzwords in their resume. So avoid buzzword stuffing that can do you more harm than it can do you good.

The fourth reason is- when the kind of role in Data Science that I might be hiring for, might be, for example, very different from what Novartis might be looking for in a Data Scientist. And what Novartis might be looking for in a Data Scientist might be very different from what Citibank is looking for; because these are different industries.

So you need to, first of all, see that what was that job asking for and does your resume match the skill set that the job requires? For example, the required skill set is to have some background in healthcare, with certain understanding of healthcare and you’re applying from a retail background. Chances are good that ideally, the company would prefer to hire someone from the healthcare background unless they’re not able to find a fit.

The fifth reason is the obvious resume mistakes– spelling mistakes, formatting issues, dates missing, not mentioning your education; the regular issues. These are the top five reasons for someone to get rejected when they are applying for Data Science roles.

Ankita: How can you tell whether a Data Scientist is fake? Is there a way to detect a fake Data Scientist?

Manav: I would say that there is nothing called as a fake Data Scientist.

Instead, there are amateur Data Scientists. Now, what amateur Data Scientists essentially mean is that these are people who are getting started in the field of Data Science, but try to project that they have already mastered Data Science and are experts. So I would call them amateur Data Scientists.

Now, the first way to spot an amateur Data Scientist from a professional Data Scientist is their resume; you can see in their resume that they don’t have any projects.

The second way is that even if they do mention projects, they don’t mention the impact of those projects.

The third way in which you spot amateur Data Scientists is that once you invite them for interviews, you would be able to see that their understanding of Data Science is very, very limited to a certain aspect.

They might, for example, have focused a lot on machine learning, while in reality they have no idea that machine learning is just a small part and they would not possibly know how to collect data, what are the kinds of permissions that are required.

So these are the three ways in which you discover that a person is an amateur or professional. Some of these questions have helped me time and again to separate an amateur data scientist from a professional data scientist.

Ankita: Thanks a lot, Manav. These insights will actually be very useful for a person looking to make a career in data science. This field will not be ambiguous to him anymore.