Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

CRISP-DM and Data Wrangling

CRISP-DM and data wrangling
2.9
You may have heard the term data wrangling before, and perhaps you might have performed it without realizing. Imagine you’re working on a project analyzing customer feedback data. You decide to use Python because you know it has good built-in data analysis functions. However, as you start to access the raw data set, you realize it contains a mixture of variables, age, date, time, season, number of passengers and more. You’ll need to transform all this data into a suitable format before you can begin. This is known as data wrangling, converting the raw data into a format that’s convenient for the consumption of data. In other words, you’ll need to clean this messy data to get anywhere with it.
47.6
A lot of data scientists will tell you that the initial steps of obtaining and cleaning data constitute 80% of their job. Today’s analysts need to pull information from many places, but working with multiple sources and preparing data for analysis can be time-consuming and difficult to implement using standard tools like Excel. Data wrangling using Python is one of the most efficient ways to make sure that your data is easy to use. Pandas is one of the most popular Python libraries for data wrangling. In this course, we’ll use pandas to learn data wrangling techniques, to deal with some of the most common data formats and their transformations.
89.1
We’ll be playing with pandas data frames, which are structured as tables, where you can easily manipulate the rows and the columns using Python code. After data loading, data wrangling is an important part of any data analysis. Without solid data wrangling skills, the rest of the data science process simply can’t progress in any meaningful way. You’ll want to prepare your data by dropping no values and filtering and selecting the right data to ensure it’s in tip-top shape before you apply any algorithms to it. By doing this, you can ensure that any machine learning or treatments you apply to your cleaned up data is fully effective.
129.9
Data wrangling is the art of providing the right information to business analysts to make the right decisions on time. It enables organizations to make strategic decisions more efficiently with minimal human intervention.

Once the data has been loaded and extracted, it then needs to be cleaned, transformed, and rearranged. This process is known as data wrangling. Let’s watch a video to learn more about data wrangling.

CRISP-DM

Do you remember learning about the CRISP-DM process earlier in Course 1?

Graphic shows the "Data analytics CRISP process". We see the six phases. "Business Understanding, "Data Understanding". Data Preparation", "Modeling", "Evaluation", and "Deployment". The diagram shows that the sequence is not strict and can move back and forth with arrows moving in both directions between the phases – this represents the cyclic nature of data mining.

In the previous course, we unpacked data ingestion as a part of the data understanding step of the CRISP-DM process. This week and next, we will have similar practical and hands-on tasks for the next step in the process of data preparation that includes data wrangling and transformation.

Sometimes, the way data is stored in the data sources (files, databases) is not in the format you need for a data processing application, and therefore substantial time is spent on data preparation.

Pandas, along with the various libraries and modules of Python, provide a flexible, high-level, and high-performing set of core manipulations and algorithms that enable you to perform the data wrangling into the required form.

This week, we will spend a lot of time on building the foundations of data wrangling activities that can be performed in Python by way of examples and programming.

This article is from the free online

Data Wrangling and Ingestion using Python

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now