Skip main navigation

The data science lifecycle

Watch the video in this step to find out about the data science lifecycle.
0
To better understand the knowledge and skills required to be a data scientist and how data science projects work, we can explore the data science life cycle. Imagine the data science life cycle as a digital onion, with each layer being a distinct process that works together to form a whole data product or solution. We can take these layers apart to explore this life cycle of data. At the core of this process is the problem. And by asking relevant questions about the nature of the problem, we can define a set of objectives that might lead us to a solution. Data gathering includes gathering and scraping all the necessary or available data to tackle the problem.
56.9
Pre-existing datasets may provide solutions to similar problems, but a unique question may call for unique data. Data preparation. Data doesn’t always come neatly packaged, and there are often inconsistencies and data types, misspelled attributes, or even missing and duplicate values that can cause problems later on. Garbage in, garbage out. Data cleaning is often the most time consuming stage, often taking 50 to 80% of the overall process. Once data has been cleaned, it can then be transformed or modified based on requirements. Data exploration. By plotting and visualising the data, we can identify general trends or relationships between data. What story is the data telling us?
114.3
Understanding patterns and bias in our data allows us to form hypothesis about the problem that we can test in the next stages. Feature Engineering is the process of using domain knowledge to transform your raw data into informative features that represent the problem you’re trying to solve. Data mining involves training machine learning models like KNN decision tree and naive bayes, evaluating their performance and identifying which model best suits the type of data we have available and the problem we’re trying to solve. Data visualisation means communicating the findings in simple yet effective visual ways to key stakeholders using graphs, plots, infographics, interactive visualisations and even storytelling methods that best conveys meaningful insight to decision makers.
175.2
Deployment, after successfully processing and modelling our data, we return to the beginning of the life cycle to evaluate whether we have achieved our objectives and gained insight into solving the problem. If we have, we can then deploy the model to tackle the problem. Monitoring real time analytics to report and maintain model performance, completing the data science lifecycle may also yield unexpected insights that aren’t necessarily useful for solving the problem we set out with. But that’s the beauty of a cycle. It has continuity and allows us to expand and iterate upon our findings, leading to more powerful insights and knowledge.

Now that we have got to know our tools, let’s learn a bit about the general process of Data Science, in other words, how and when we might apply these tools to the gathering, processing, analysing, and reporting of our findings.

Your task

Watch the video and learn the crucial steps involved and the importance of each one. At the end of the video, you will have an overview of the basic process involved, which you will be applying as you complete the activities in this course.
In the comments section below, write about one of the eight steps and what makes it so important to the overall process. What problems do you foresee could occur if that step was skipped?
This article is from the free online

Applied Data Science

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education