Accredian’s Spotlight Series: In conversation with Sanjay Malhotra

Meet Sanjay Malhotra, a determined data enthusiast from the February ’23 Cohort of E&ICT IIT-Guwahati’s Executive Program in Data Science & AI program. With a previous background in a pharma sciences, Sanjay spent a majority of his time as a Project Head- of Chemistry at Solara Active Pharma Sciences.

Seeking to enhance his work experience and explore the world of Data Science & AI, Sanjay enrolled in E&ICT IIT-Guwahati’s Executive Program in Data Science & AI program.

Let’s dive into Sanjay’s journey and discover how Accredian helped him achieve his goals.

Question 1: Which program and batch are you part of at Accredian and tell us more about your current work profile.

Sanjay: Currently, I am working for Aragen Life Sciences as an Associate Director in the Medicinal Chemistry Division overlooking the Chemistry Solutions for Clients from various Biotech Industries.

My responsibilities include:

Providing strategic direction and leadership to cross-functional teams in the area of pharmaceutical research and drug discovery. Leading cross-functional teams to ensure timely execution and successful completion of R&D projects, coordinating activities with various departments and CROs.
Effective collaboration with scientists and technologists to facilitate the development, prioritization, and execution of cross-functional project goals with issue identification, resolution, and contingency planning, utilizing analytical capabilities and critical thinking skills to drive decision-making.

Since the users are restricted by the vast analysis required in day-to-day data generation, Data science methods are used for analysing and interpreting data obtained from various analytical techniques, such as mass spectrometry, chromatography, and spectroscopy.

Pharmaceutical companies use data science techniques to analyse large datasets and make informed decisions about drug development and clinical trials. Additionally, the principles of chemistry & pharmacology are relevant to drug discovery and design, which are areas where data science techniques are particularly in value.

So, I decided to add DS perspective very much needed indeed.

Question 2: Walk us through your career journey & what got you interested in Data Science & Artificial Intelligence.

Sanjay: I worked with leading Pharmaceutical Research Organization such as:

Ranbaxy Research Laboratories (Now Sun Pharma)
Daiichi Sankyo Pharmaceuticals (A Japanese Pharmaceutical Company) specializing in Drug Discovery.

Here I was mainly part of the Medicinal Chemistry team in the New Drug Discovery Research (NDDR).

We were primarily involved in designing of new drug with Computational chemists, synthesizing them with the help of Medicinal Chemist, evaluation by the Pharmacologist/Microbiologists and finally analysis with the Pharmacokinetic team for PK/PD evaluations and other physicochemical parameters of the active drugs.

With my background in Chemistry and Computer Knowledge, I already have a solid foundation that can be beneficial in the pharmaceutical industry. To enhance my skills, I decided to focus on:

Pharmaceutical Research and Development (R&D) which is at the forefront of drug discovery and development.

With Chemoinformatics and Computational Chemistry and my chemistry background, these areas will allow me to leverage computational tools for drug design and optimization. I wanted to familiarize with the following tools:

Molecular Modeling Software: Tools like PyMOL, Schrodinger, and Autodock which are used for molecular visualization and drug-protein interactions.
Statistical Software: Python for data analysis in clinical trials and research.
Database Management Systems: Understanding how to work with databases in pharmaceutical research and clinical data management which is quite valuable.

Also Machine learning techniques are quite valuable in Drug Discovery research which can improve the decision-making in pharmaceutical data across various applications like QSAR analysis, hit discoveries, de novo drug architectures to retrieve accurate outcomes, Target validation, prognostic biomarkers, digital pathology etc.

Machine learning (ML) has been making significant strides in the pharmaceutical industry. Its applications in the pharma industry are diverse and impactful, helping to accelerate drug discovery, improve patient outcomes, optimize manufacturing processes, retrosynthesis, in which ML predicts the likely routes of:

Organic synthesis
Atomic Simulations, which utilize the ML potential to accelerate potential energy surface sampling
Heterogeneous Catalysis, in which ML assists in various aspects of catalysis widely used in Discovery Research

Question 3: What all tools and packages in Data Science & Artificial Intelligence have you mastered in your current program at Accredian so far?

Sanjay: This is what I have learnt so far:

Statistics and Probability skills

Mean, median and mode;
Standard deviation and variance;
Correlation coefficients and the covariance matrix
Probability distributions – Binomial, Normal;
P-value;
Bayes’ Theorem
Aspects of the confusion matrix including precision, recall, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curves, Central Limit Theorem, R2 score, Mean Square Error.

Multivariable calculus and linear algebra

Linear algebra and multivariable calculus to manipulate and transform data and derive insights. Linear algebra is applied in data processing and transformation, dimensionality reduction and model evaluation.

Data analysis and wrangling

Data Visualization The most important outcome of data visualization is by successfully building a story from the data using visualizations that people can easily understand with a variety of data plotting and charting approaches including the following:

Histograms;
Bar and area charts, pie and line charts, waterfall charts, thermometer and candlestick charts;
Segmentation and clustering diagrams;
Scatter plots and bubble charts;
Visualizations of classification space;
Methods for visualization during exploratory data analysis;
Frame and tree diagrams;
Heatmaps, video and image annotations;
Map and geospatial visualizations; and
The use of a wide range of gauges, metrics and measures.

Data manipulation, preparation and wrangling

ML algorithms, modeling and feature engineering

Decision trees, Random Forests, bagged and boosted tree approaches;
Bayesian methods;
k-Nearest Neighbors;
Ensemble methods;
Clustering approaches including k-means, gaussian mixture and principal component analysis;
Perform model evaluation and hyperparameter optimization. This means performing cross-validation and model optimization steps, as well as understanding ROC and learning curves.

Data Engineering and manipulation tools

Reporting and Business Intelligence tools such as:

Excel: It gave us a diverse range of options including Pivot tables and charts
QlikView: It let us consolidate, search, visualize, and analyse all our data sources with just a few clicks.
PowerBI: It is a Microsoft offering in the Business Intelligence (BI) space. As in our organization, we have a Sharepoint database user, it was quite helpful.

Predictive and ML tools:

Python
Jupyter Notebooks

Question 4: What are some of initial challenges when you got started on your Data Science journey and how did you overcome it?

Sanjay: Data scientists need to effectively communicate their findings and insights to non-technical stakeholders, such as business executives about the problem they are solving.

This can be challenging, as data scientists often have technical backgrounds and may struggle to translate their analyses into clear and actionable business insights. Data Scientists have to be hands on.

They have to get into the weeds, write code, manage data, build visualizations, predict outcomes, etc. This, for the large part, is the job of an individual contributor in an exciting field where new tools and techniques are launched literally every other week.

In such an environment, most Data Scientists want to stay hands on because they feel that’s how they can stay relevant and add to their market value. As I was from a non-coding background, I became determined to practice coding.

The only challenge I faced was that I needed hands-on experience and training on how to code as it was quite new to me. I already had to some extent an understanding of the concept before. So, I started learning about Python, various ML algorithms and similar things on my own using Kaggle, Github, you tube Videos etc.

Later, when a session was done about hands on practices in the class, I could understand it better and things became much more clearer later with practice and practice.

Question 5: Who is your favorite faculty at Accredian and what did you learn from him the most?

Sanjay:

Arun Prakash and Nishkam Verma have been my favourite mentors.

All the practical classes fell under them and I was looking forward to hands-on experience more than theory. They really explained the concepts very well with hands on training simultaneously with the class.

They both made the class very interactive, intuitive and engaging always. I liked the way Arun Prakash handled all queries. He gave chance to speak to the students on air and have a direct conversation with him about any doubts. He made the entire picture clear with his answers through examples so that the query was fully solved.

Besides that, he also gave additional lectures on Generative AI – to enhance the ability to extract meaningful information from vast amounts of data and rediscover solutions to problems using LLMs.

Question 6: In your view, what is the goal of Data Science?

Sanjay: Data Science is an interdisciplinary field between science and computing used to generate insights. It involves mathematics, scientific methods and processes.

The skills needed are from mathematics (Statistics and probability) and data engineering to computer science and software programming (Usually Python). It allows multiple projects, from object detection to machine learning.

The objective of the Data Scientist is to explore, sort and analyse megadata from various sources in order to take advantage of them and reach conclusions to optimize business processes or for decision support

The goal of a data scientist is to analyse business data to extract meaningful insights. A data scientist solves business problems through a series of steps, including:

The data scientist determines the problem by asking the right questions and gaining understanding.
The data scientist then determines the correct set of variables and data sets.
The data scientist gathers structured & unstructured data from many disparate sources.
Once the data is collected, the data scientist processes the raw data and converts it into a format suitable for analysis. This involves cleaning and validating the data to guarantee uniformity, completeness, and accuracy.
After the data has been rendered into a usable form, it’s fed into the analytic system-ML algorithm or a statistical model. This is where the data scientists analyse and identify patterns and trends.
When the data has been completely rendered, the data scientist interprets the data to find opportunities and solutions.
The data scientists finish the task by preparing the results and insights to share with the appropriate stakeholders and communicating the results.

Question 7: How has Data Science evolved in last few years?

Sanjay: Over the past decade, businesses faced challenges with data overload, leading to silos and inaccurate processing. Data scientists shifted roles to focus on integration and pipeline improvement, altering the landscape for their expertise. Despite the surge in startups leveraging data for a competitive edge, not all businesses fully utilize the infrastructure built by data scientists.

The explosion of data ushered in a transformation in our daily lives, with Data Science and Big Data playing pivotal roles in various business operations. Acknowledged as the hottest job of the 21st Century, data scientists amalgamate programming, statistical, & storytelling skills to unveil insights from vast data sets.

The demand for Data Scientists skyrocketed in the last decade, with a 1500% rise in job listings between 2011 and 2012, underscoring their crucial role in evolving technology. Artificial Intelligence became a reality, permeating our daily lives with real-world applications.

Big Data revolutionized business, driven by a steep decline in hard-drive costs, encouraging extensive data storage and analysis for insights. Cloud computing also reshaped how organizations approached Information Technology, leading to a widespread adoption of cloud-based applications in a “cloud-first” approach.

Question 8: What are the current trends in Data Science that you are most excited about?

Sanjay: In today’s digital world Generative AI and NLP are some of the Technologies which I am excited about.

Amidst the constantly evolving landscape of artificial intelligence, a particular technology shines due to its remarkable capacity to replicate human creativity and communication Generative AI.

This advanced technology has revolutionized diverse domains, from artistic endeavors and musical creations to language comprehension and interactive dialogues.

The Generative Pre-Trained Transformer (GPT) series is leading the charge in this transformative journey, notably featuring the pioneering GPT-3 model, an innovation hailing from OpenAI.

With the continuous progress of generative AI systems involves leveraging natural language prompts to produce desired outputs from AI/ML models. OpenAI’s recent introduction of models like ChatGPT highlights the crucial role that well-crafted prompts play in optimizing the performance of these AI systems.

Other technologies which I am excited about are:

Auto-ML: Automated machine learning platforms are gaining popularity and taking over various aspects of the data science lifecycle. These platforms automate tasks such as data sourcing, feature engineering, conducting machine learning experiments, evaluating and choosing the most effective models, and deploying them into production environments.

MLops: MLOps, short for machine learning operations, encompasses a range of practices and tools employed to handle the operational aspects of machine learning model lifecycles.

These include tasks such as auto-retraining, dynamic learning, packaging and containerization, and deploying models into production environments. As MLOps practices continue to improve in efficiency and effectiveness, they will enable the data scientists to focus more on tasks such as model retraining and calibration.

Question 9: Which are some of the blogs that you follow?

Sanjay: The blogging platforms that I mainly follow are:

Inside Bigdata: It focuses on the machine learning side of data science. It covers big data in IT and business, machine learning, deep learning, and artificial intelligence. Guest features offer insight into industry perspectives, while news and Editor’s Choice articles highlight important goings-on in the field.
No Free Hunch: This blog is run by Kaggle. It has tutorials and news about data Science. It hosts data science projects and competitions that challenges data scientists to produce the best models for featured data sets.
KD nuggets: This blog I use for news information on big data, business analytics, & data mining with KD Nuggets. We can also get connected to top professionals in the field, find courses, jobs, and meetups.

Question 10: What is your advice to anyone wanting to start a career in Data Science?

Sanjay: The main goal of Data Scientist is to create data-based solutions to help the business. This is a quite broad and maybe understated objective, simply because there are many forms to achieve it. Be thorough with the concepts such as:

Dashboards that allow the upper management to make well-informed strategic decisions.
Develop models capable of predicting complex variables, such as product pricing or customer propensity.
Find patterns that might indicate market tendencies, or some necessity of the public.

Learn the Concepts related to Python, Pandas, Numpy before you begin with ML algorithms. Basics and fundamentals are the key to succeed to DS as a career.

For starters, my tip here is: build a portfolio. Search the internet for cool problems and projects, create the solutions, and post on your GitHub or your kaggle. Show practical use cases that you understand and perform your role well.

Create study groups and solve cases and competitions together. Teach each other. Do not be satisfied with knowing which tool to use, but always try to find out how each of them works. With time you will become more and more specialized.

We hope you enjoyed reading this interview. Check out the Accredian Spotlight for more interesting student stories like this.