Skip main navigation

The data science life cycle

What is the data science life cycle?

When your business project involves handling big data, it is essential to have a workflow or a life cycle finalised at the beginning. There are certain advantages to locking in a workflow right at the start. These include:

  • multiple teams working in sync (even remotely)
  • delays being identified and planned early on
  • avoiding bottlenecks when there is a structured workflow.

Why is a data science life cycle required?

Watch this video where Linda and Alvin at Data Dig Ltd. try to understand the life cycle based on their current business requirements.

Why is a data science life cycle required?

What is the data science life cycle?

A data science life cycle uses various analytical and statistical methods and theories from machine learning to deduce insights and make predictions from the collected data. The entire process of the data science life cycle can take months to complete. A data science life cycle is not linear but an iterative process. This means that the project implementation can involve going back and forth during certain phases based on the tasks, challenges, or the programming tools or systems at hand.

Common types of data science life cycle

A data science life cycle can have five or more phases in a sequence, depending on the complexity of the business requirements, and there can be several ways to implement the sequence of steps. Experts have formulated various types of data science life cycles for data scientists to drive business projects, and some data scientists even choose to design their life cycle.

Some of the common types of data science life cycles are listed here for your reference:

  1. Team Data Science Process (TDSP) life cycle
  2. Knowledge Discovery in Databases (KDD) life cycle
  3. Domino life cycle
  4. Sample, Explore, Modify, Model, and Assess (SEMMA)
  5. Cross-Industry Process for Data Mining (CRISP-DM)

You may wish to research a few of them to learn a bit about them, but it’s likely that any organisation you join will already have one. The choice of the life cycle solely depends on the organisation and type of business you run or work for.

The modified CRISP-DM data science life cycle

The CRISP-DM process has won accolades owing to its ease of use as a data science life cycle. A survey conducted by Data Science Process Alliance, a training, consulting and research organisation, revealed that CRISP-DM remains the most popular framework for executing data science projects.

Graphic shows a horizontal bar chart on which process one most commonly uses for data science projects. CRISP-DM: 40%, Scrum: 18%, Kanban: 12%, My own: 12%, TDSP: 4%, Other: 3%, None: 2%, SEMMA: 1%.Click to enlarge

Source: Data Science Process Alliance [1]

The data science life cycle we will be exploring in this short course is inspired by the CRISP-DM process. The image below describes the CRISP-DM process and the modified CRISP-DM process for you to compare and identify similarities and differences.

Graphic shows a comparison between the CRISP-DM and data science life cycle. On the top is the CRISP-DM, the process is from business understanding, data understanding, data preparation, modelling, evaluation to deployment. At the bottom is the data science life cycle, the process is from business understanding, discovery, data preparation, data modelling, model building, evaluation and communicating results. Click to enlarge

Next, you will learn how to apply the first two stages of the data science life cycle to frame a business problem.

References

  1. Saltz J. CRISP-DM is Still the Most Popular Framework for Executing Data Science Projects [Article]. Data Science Process Alliance; 2020 Nov 30. Available from: https://www.datascience-pm.com/crisp-dm-still-most-popular/
This article is from the free online

Introduction to Data Science for Business

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education