Skip main navigation

Aligning, mapping, and sorting data in Pandas

Article discussing aligning, mapping, and sorting data in Pandas.

As seen earlier, preparing data is the first step in data analysis. Data analysts can manipulate data for their data analysis using the align, map, and sort features in Pandas to make it more readable and organised.

Data alignment

When we perform mathematical operations between Panda objects with different indexes, Pandas will perform the data alignment into the resulting Panda object. This operation is known as data alignment.

Code:

df1 = DataFrame(np.arange(9).reshape(3,3), columns=['a','b','c'], index=['SA', 'VIC', 'NSW'])
df1

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of data alignment.

Code:

df2 = DataFrame(np.arange(12).reshape(4,3), columns=['a','b','e'], index=['SA', 'VIC', 'NSW', 'ACT'])
df2

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of data alignment.

Adding DataFrames

In case of addition, if index pairs are not the same, the resultant Pandas object will have the index that is the union of both the original index and missing values will be filled as NaN (Not a Number).

Code:

df1+df2

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of data alignment returning NaN for rows labeled ACT, NSW, VIC, and SA.

Handling missing data

We also can pass parameter values to determine how missing values should be dealt with, which performs this internal data alignment.

Code:

df1.add(df2, fill_value=0)

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of data alignment for ACT, NSW, SA, and VIC.

Mapping

Often, we would want to change or manipulate the values in a particular row or a column by applying some functions only to select values.

For example, think of a data set that captures information about an extensive collection of products (represented as columns in the data set). These products go through an update every year. Here, you need a way to update all the version numbers quickly and easily instead of updating each product individually.

This process is known as mapping, and this can be done by using the .apply() method, which has the following parameters:

  • a lambda function, to specify what kind of transformation needs to be applied
  • an axis parameter, which by default equates to 0 and so applies across the index (and not columns).

The following code snippets demonstrate this behaviour:

Code:

df_states

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of data frame mapping for rows labeled WA, SA, VIC, NSW, ACT, QLD, NT.

Code:

f = lambda x:x.upper()

Code:

df_states['state'] = df_states['state'].apply(f)
df_states

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of applying the mapping to states. The screenshot shows columns for state name, abbreviation, timezone, population, and GDP.

Sorting

Sometimes, data must be sorted in order to make it clear and meaningful. For instance, consider an on-demand video streaming service that wants to know which TV series in its catalogue are the most popular ones so that they could be renewed for another season. Here, the series titles need to be sorted along with the extent of how much they are being watched.

The sorting function in Pandas comes in handy in situations like the above.

Sorting the indexes/labels

To sort data lexicographically (i.e. the dictionary order) by row or column index, we use the sort_index() method. See below for a demonstration of sorting the indexes.
It should be noted that this method returns a new object, which is sorted based on the criteria specified:

  • Original DataFrame.

Code:

df_states

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of sorting original data frames. The screenshot shows columns for state name, abbreviation, timezone, population, and GDP.

  • DataFrame sorted by row index.

Code:

df_states.sort_index()

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of sorting row indexes. The screenshot shows columns for state name, abbreviation, timezone, population, and GDP.

  • DataFrame sorted by columns (lexicographically).

Code:

df_states.sort_index(axis=1)

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of the sorting row axis.

Sorting by values

Instead of sorting by indexes and labels, we can also sort the data by the actual values in the columns. For this purpose, another function known as sort_values() can be used. This function will sort the data on the basis of values instead of labels.

See below code snippet for an example, where we will arrange the values by GDP column:

Code:

df_states

Output:

Screenshot from the Jupyter Notebook. The screenshot shows an example of sorting by value in the GDP column sorting smallest to largest by Australian state.

Code:

df_states.sort_values(GDP)

Output:

Screenshot from the Jupyter Notebook. Screenshot shows an example of sorting by value in the GDP column sorting smallest to largest by Australian state.

Next, you will engage in an exercise to apply your learnings on manipulating data in Pandas.

This article is from the free online

Introduction to Data Analytics with Python

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education