Skip main navigation

Reshaping data sets

Learn to reshape data sets.

Python has operations for rearranging tabular data, known as reshaping or pivoting operations.

For example, hierarchical indexing provides a consistent way to rearrange data in a DataFrame.

There are two primary functions in hierarchical indexing:

  • stack(): rotates or pivots data from columns to rows
  • unstack(): pivots data from rows to columns

Here is the syntax for both the functions:


DataFrame.stack(level=- 1, dropna=True)


DataFrame.unstack(level=- 1, fill_value=None)

Let’s try these operations with some examples. Use these code snippets:

First, create a dummy DataFrame.

Code:


data = pd.DataFrame(np.arange(6).reshape((2,3)),
index=pd.Index(['Victoria', 'NSW'], name='state'),
columns=pd.Index(['one','two','three'], name='number'))
data

Output:

Graphic shows a table. Y axis labels reads state, VIC, NSW, and the X axis across the top reads one, two, three. The row for VIC reads, 0, 1, 2. The row for NSW reads 3, 4, 5.
Click to enlarge

Next, we use the stack() function, we will pivot the columns into rows

Code:

data_stack = data.stack()
data_stack

Output:

Graphic shows a table. Y axis labels VIC, NSW. X axis reads state and number. Each row reads: VIC, one, 0; VIC, two, 1; VIC, three, 2; NSW, one, 3; NSW two, 4; NSW, three, 5. The final row of the table reads "dtype: int32"
Click to enlarge

You can see that:

  • the operation converted the columns to row labels

  • the values now have hierarchical indexing (state and number)

  • the operation converted the DataFrame to a series.

You can confirm these changes with this code:


type(data_stack)

Output:
pandas.core.series.Series


data_stack.index

Output:

MultiIndex(levels=[['Victoria', 'NSW'], ['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=['state', 'number'])

From a hierarchically indexed series, you can rearrange the data back into a DataFrame with the unstack() function.

Try this code:


data = data_stack.unstack()
data

Output:

Graphic shows a table. Y axis labels reads state, VIC, NSW, and the X axis across the top reads one, two, three. The row for VIC reads, 0, 1, 2. The row for NSW reads 3, 4, 5.
Click to enlarge

By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack method.

For example, try this code that unstacks data_stack at the level of state, rather than number:

Code:

data_state = data_stack.unstack('state')
data_state

Output:

Graphic shows a table. Y axis labels reads one, two, three, and the X axis across the top reads state, VIC, NSW. Row one reads: 0, 3. Row two reads: 1, 4. Row three reads: 2, 5. Click to enlarge

This article is from the free online

Data Wrangling and Ingestion using Python

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education