Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only. T&Cs apply

Find out more

Team Data Science

An article presenting the Team Data Science process.
© Luleå University of Technology

The Team Data Science Process (TDSP) was developed by Microsoft and is nowadays used within data science projects. It is an agile and iterative approach to delivering intelligent and effective applications and solutions based on predictive analytics (using current and historical information to predict future events).

This article is only a brief introduction to TDSP. More information can easily and freely be found online.

Why use TDSP?

The benefits of using TDSP include improved collaboration and learning, as well as guidance toward successful implementation of data science initiatives. Perhaps the most important component of TDSP is that it provides structure by offering clearly outlined steps and the best practices and structures used by industry leaders within the data science field.

These benefits are enabled through the four key components of TDSP, which are:

  • A definition for the data science lifecycle
  • A standardized project structure
  • A recommendation of infrastructure and resources
  • A recommendation of tools and utilities

The TDSP project lifecycle

TDSP provides a lifecycle which is used to structure the development of data science projects. This lifecycle will outline and emphasize the steps which are usually required for successful projects. The TDSP lifecycle is designed to match best with intelligent applications, machine learning and artificial intelligence. It is also possible to use for more exploratory types of projects, but in these cases some steps may not apply. Typically, projects using TDSP will have the following four steps in their cycle:

  1. Understanding the business
  2. Acquiring data and understanding said data
  3. Modeling
  4. Deployment

Here you can see a visual representation of what the lifecycle looks like:

Roles

Different roles will naturally be in charge of different steps within the cycle. In this example you can see how the roles project lead, data scientist, project manager and solution architect divide the work within a typical TDSP-cycle.

Standardization

The purpose of standardizing the project is to make it easy for everyone involved to find the information that they need, while still keeping it secure and organized. It also provides the added benefit of advancing the knowledge of the entire organization over time.

All of the documents and code produced should be stored in a system that enables team collaboration. Tasks and features should be tracked in an agile project system so that accurate cost and time estimates can be provided.

To help with this TDSP provides templates, folder structure and checklists that keep the project and everyone who is a part of it on track toward a common goal.

Infrastructure and resources

TDSP provides recommendations for storage and sharing through everything from cloud files and databases to machine learning services and clusters of big data.

A key aim of this type of data infrastructure is to enable reproductive analysis and avoid confusion that may otherwise arise from duplicate files or difficult-to-track changes.

A team that works on multiple projects can therefore easily share various analytics and components with the project group in a structured and secure way using.

Tools and utilities

Possibly the most hands-on important element of TDSP is the tools and utilities that it offers for project execution.

If we assume that introducing processes into an organization is a highly challenging and complex task, then having tools that let us jump in and lower the threshold is highly desirable.

With TDSP, a common baseline can quickly be reached which allows for continued contribution and development, with help of both the well-defined structure of the lifecycle itself and the resources TDSP offers. This is in essence how the purpose of the model is fulfilled, efficient development and delivery of solutions.

© Luleå University of Technology
This article is from the free online

Data Science for Climate Change

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now