Skip main navigation

How do you reshape a data set?

Data analysts must possess the ability and tools to look at data from different layouts and orientations in order to draw solid insights.

To begin with, let us define the ‘shape’ of a data set. The shape of a data set refers to the way in which a data set is arranged into rows and columns, and reshaping data is the rearrangement of the data without altering the content of the data set. Reshaping data sets is a very frequent and cumbersome task in the process of data manipulation and analysis. Data analysts must possess the ability and tools to look at data from different layouts and orientations in order to draw solid insights.

Reshaping data sets in Python

Python offers multiple functions to reshape data sets and so let’s explore two of these.

    • stack(): reshapes the DataFrame by converting the data into stacked form, that means pivoting the innermost column index into the innermost row index.
    • unstack(). reshapes the DataFrame by pivoting the innermost row index back into the innermost column index.

Here is the syntax for both the functions:

DataFrame.stack(level=- 1, dropna=True)
DataFrame.unstack(level=- 1, fill_value=None)

stack() function

Let’s try these operations with some examples. Use these code snippets:

First, create a dummy DataFrame.

Code:

data = pd.DataFrame(np.arange(6).reshape((2,3)),
 index=pd.Index(['Victoria', 'NSW'], name='state'),
 columns=pd.Index(['one','two','three'], name='number'))
data

Output:

Graphic shows a table. Y-axis labels read state, VIC, NSW, and the X-axis across the top reads one, two, three. The row for VIC reads 0, 1, 2. The row for NSW reads 3, 4, 5.

Next, we use the stack() function and we will pivot the columns into rows.

Code:

data_stack = data.stack()
data_stack

Output:

Graphic shows a table. Y-axis labels VIC, NSW. X-axis reads state and number. Each row reads VIC, one, 0; VIC, two, 1; VIC, three, 2; NSW, one, 3; NSW two, 4; NSW, three, 5. The final row of the table reads "dtype: int32"

You can see that:

    • the operation converted the columns to row labels
    • the operation converted the DataFrame to a series.

You can confirm these changes with this code:

Code:

type(data_stack)

Output:

pandas.core.series.Series

Code:

data_stack.index

Output:

MultiIndex(levels=[[‘Victoria’, ‘NSW’], [‘one’, ‘two’, ‘three’]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=[‘state’, ‘number’])

unstack() function

 

You can reshape a stacked DataFrame back to its unstacked format with the unstack() function.

Try this code:

Code:

data = data_stack.unstack()
data

Output:

Graphic shows a table. Y-axis labels read state, VIC, NSW, and the X-axis across the top reads one, two, three. The row for VIC reads, 0, 1, 2. The row for NSW reads 3, 4, 5.

By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack() method.

For example, try this code that unstacks data_stack at the level of state, rather than number:

Code:

data_state = data_stack.unstack('state')
data_state

Output:

This article is from the free online

Introduction to Data Analytics with Python

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now