Building effective Data Science teams is an important part of ensuring productivity and functionality across data-driven problem solving business centres.
Why do you need well structured Data Science teams in the first place?
This blog reveals the formula behind successful Data Science teams, the reason why you need them and how to maximise productivity from them.
Structure of a Data Science Team
An efficient Data Science Team requires an elaborate network of many Data related roles carefully put together to ensure smooth functioning of business units. Here are the many roles that make up a brilliant Data Science team.
Data Engineers
Data Engineers are responsible for building and maintaining the technical infrastructure required in order do modeling, predictions, and analysis. The engineers create and maintain databases, machine learning pipelines, and production processes. Without having properly stored data, modeling processes, and the ability to serve predictions in production a Data Scientist is essentially useless.
Data Scientists
Once the initial groundwork has been laid, a Data Scientist then owns the modeling process. Generally, they take input parameters from product or other team leads in order to understand the model’s business objective. They then work to articulate requirements to the engineers and other stakeholders. Once these criteria have been defined, the process of building tests, models, and evaluating performance begins.
You have to identify what kind of a Data Scientist you are before you go into understanding what teams you need around yourself.
Type A stands for Analysis. This person is a statistician that makes sense of data without necessarily having strong programming knowledge. Type A data scientists perform data cleaning, forecasting, modeling, visualization, etc.
Type B stands for Building. These Data Scientists use data in production. They’re excellent good software engineers with some stats background who build recommendation systems, personalization use cases, etc.
Data Analysts
As your team continues to grow and you are scaling up, your modeling Data Analysts become a very important part of the team. Having started my career in this position, I have a deep respect for the value a machine learning analyst can provide to a mature team. Analysts monitor processes, evaluate data quality, and monitor production model performance.
These steps seem relatively routine but when you realize the fact that a model is never “complete” and will always require some oversight then appointing an analyst to manage the process makes sense. This allows your more senior assets to focus on innovation instead of maintenance.
Managers
As the data team and number of models grows, the need for a Data Science Manager appears. This person coordinates the quants, devs, and analysts as well as manages external demand of the data science team. The Data Science Manager essentially guides the process, allocates resources, and occasionally shields the team from ad hoc requests so they are able to achieve their primary objectives.
Machine Learning Engineer
He combines software engineering and modeling skills by determining which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is ML engineer’s job.
Distribution of Authority/Accountability in teams:
Decentralized. This is the least coordinated option where analytics efforts are used sporadically across the organization and resources are allocated within each group’s function. This often happens in companies when data science expertise has appeared organically, which often leads to silos striving, lack of analytics standardization, and – you guessed it – decentralized reporting.
Functional. Here most analytics specialists work in one department where analytics is most relevant: it’s often marketing or supply chain. This option also entails little to no coordination and expertise isn’t used strategically enterprise-wide.
Consulting. In this structure, analytic folks work together as one group but their role within an organization is consulting, meaning that different departments can “hire” them for specific tasks. This, of course, means that there’s almost no resource allocation – either specialists are available or not.
Centralized. This structure finally allows you to use analytics in strategic tasks – one data science team serves the whole organization in a variety of projects. Not only does it provide a DS team with long-term funding and better resource management, it also encourages career growth. The only pitfall here is the danger of transforming an analytics function into a supporting one.
Center of Excellence: If you pick this option, you’ll still keep the centralized approach with a single corporate center, but data scientists will be allocated to different units in the organization. This is the most balanced structure – analytics activities are highly coordinated, but experts won’t be removed from business units.
The Right Platforms:
When building a data science team, it is also important to consider the platform your company is using for the process. A range of options are available including Hadoop and Spark, Python. If you have people on the team that do not have these skills and that do not know how to use the various platforms, it is important they learn.
Certification courses can be a great option for teaching the additional skills needed, and to get everyone on the team on the same page.
Some of the other platforms to consider include the Google Cloud Platform, and business analytics using Excel. Understanding the fundamentals of these systems can provide a good overall foundation for the team members.
In the end…
When you are creating a data science team for the company, you do not want to rush and choose the wrong people and platforms or not have quality processes in place. Take your time to create a team that will provide your company with the quality and professionalism it needs.