£199.99 £139.99 for one year of Unlimited learning. Offer ends on 14 November 2022 at 23:59 (UTC). T&Cs apply

Find out more
CRISP-DM and Data Wrangling
Skip main navigation

CRISP-DM and Data Wrangling

CRISP-DM and data wrangling
You may have heard the term data wrangling before, and perhaps you might have performed it without realizing. Imagine you’re working on a project analyzing customer feedback data. You decide to use Python because you know it has good built-in data analysis functions. However, as you start to access the raw data set, you realize it contains a mixture of variables, age, date, time, season, number of passengers and more. You’ll need to transform all this data into a suitable format before you can begin. This is known as data wrangling, converting the raw data into a format that’s convenient for the consumption of data. In other words, you’ll need to clean this messy data to get anywhere with it.
A lot of data scientists will tell you that the initial steps of obtaining and cleaning data constitute 80% of their job. Today’s analysts need to pull information from many places, but working with multiple sources and preparing data for analysis can be time-consuming and difficult to implement using standard tools like Excel. Data wrangling using Python is one of the most efficient ways to make sure that your data is easy to use. Pandas is one of the most popular Python libraries for data wrangling. In this course, we’ll use pandas to learn data wrangling techniques, to deal with some of the most common data formats and their transformations.
We’ll be playing with pandas data frames, which are structured as tables, where you can easily manipulate the rows and the columns using Python code. After data loading, data wrangling is an important part of any data analysis. Without solid data wrangling skills, the rest of the data science process simply can’t progress in any meaningful way. You’ll want to prepare your data by dropping no values and filtering and selecting the right data to ensure it’s in tip-top shape before you apply any algorithms to it. By doing this, you can ensure that any machine learning or treatments you apply to your cleaned up data is fully effective.
Data wrangling is the art of providing the right information to business analysts to make the right decisions on time. It enables organizations to make strategic decisions more efficiently with minimal human intervention.

Once the data has been loaded and extracted, it then needs to be cleaned, transformed, and rearranged. This process is known as data wrangling. Let’s watch a video to learn more about data wrangling.


Do you remember learning about the CRISP-DM process earlier in Course 1?

Graphic shows the "Data analytics CRISP process". We see the six phases. "Business Understanding, "Data Understanding". Data Preparation", "Modeling", "Evaluation", and "Deployment". The diagram shows that the sequence is not strict and can move back and forth with arrows moving in both directions between the phases – this represents the cyclic nature of data mining.

In the previous course, we unpacked data ingestion as a part of the data understanding step of the CRISP-DM process. This week and next, we will have similar practical and hands-on tasks for the next step in the process of data preparation that includes data wrangling and transformation.

Sometimes, the way data is stored in the data sources (files, databases) is not in the format you need for a data processing application, and therefore substantial time is spent on data preparation.

Pandas, along with the various libraries and modules of Python, provide a flexible, high-level, and high-performing set of core manipulations and algorithms that enable you to perform the data wrangling into the required form.

This week, we will spend a lot of time on building the foundations of data wrangling activities that can be performed in Python by way of examples and programming.

This article is from the free online

Data Wrangling and Ingestion using Python

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education