What are the essential operations in Pandas?

Pandas has certain essential operations that data analysts need to use to interact with the data stored in Series and DataFrame.

Data analysts spend a significant amount of time cleaning and preparing data sets to work on. They must possess the necessary tools and ability to work with messy data sets, missing values, inconsistencies, and ambiguous data.

Pandas has certain essential operations that data analysts need to use to interact with the data stored in Series and DataFrame. These operations allow data analysts to get data into a workable form before the data analysis.

Reindexing

A necessary operation that we perform on the Pandas data structure is reindexing, which means creating a new object and rearranging the data in the Pandas data structure, conforming to the new index.

While doing so, if data is not present for some index in the original data, missing values are added, corresponding to those indexes.

Code:

a = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
a

Output:

a 0.591050

b -0.952670

c -0.948599

d 0.091596

e -1.096649

f 0.199346

g 0.856941

h -0.086180

i -2.623903

j 0.271230

dtype: float64

Code:

new_index = ['a','A1','b','B1','c','C1','d','e','f','g','h','i','j']
a_new = a.reindex(new_index)
a_new

Output:

a 0.591050

A1 NaN

b -0.952670

B1 NaN

c -0.948599

C1 NaN

d 0.091596

e -1.096649

f 0.199346

g 0.856941

h -0.086180

i -2.623903

j 0.271230

dtype: float64

Handling missing values during reindexing

Imagine a situation where you are processing employee records. However, many of the employees have supplied incomplete information. You need a way to handle these cases and highlight the gaps to follow up with them. Perhaps you could insert ‘Unknown’ into all the empty fields to make the missing values easy to identify.

There are various ways the missing values can be handled during reindexing. We can:

• either specify a particular value to be filled – we do this by adding a parameter fill_value = <value to be filled> to the reindex method

For example:

Code:

a_fillvalue = a.reindex(new_index, fill_value=0)
a_fillvalue


Output:

a 0.591050

A1 0.000000

b -0.952670

B1 0.000000

c -0.948599

C1 0.000000

d 0.091596

e -1.096649

f 0.199346

g 0.856941

h -0.086180

i -2.623903

j 0.271230

dtype: float64

Or, we can specify the pre-defined options by passing a parameter method = <predefined method values>. This method is handy in case we need to do operations like interpolation, forward fill, backward fill, and so on for instances such as time-series data analysis.

For example:

Code:

a = Series(np.random.randn(10), index=[0,2,4,6,8,10,12,14,16,18])
a


Output:

0 1.036439

1 1.036439

2 -0.841819

3 -0.841819

4 0.629621

5 0.629621

6 -1.905720

7 -1.905720

8 1.673387

9 1.673387

10 0.792506

11 0.792506

12 0.267104

13 0.267104

14 0.759571

15 0.759571

16 -0.847925

17 -0.847925

18 -0.598402

19 -0.598402

dtype: float64

Code:

## Reindex so that indexes 1,3,5... are introduced in the series
a_new = a.reindex(range(20))
a_new

Output:

0 1.036439

1 NaN

2 -0.841819

3 NaN

4 0.629621

5 NaN

6 -1.905720

7 NaN

8 1.673387

9 NaN

10 0.792506

11 NaN

12 0.267104

13 NaN

14 0.759571

15 NaN

16 -0.847925

17 NaN

18 -0.598402

19 NaN

dtype: float64

Code:

## Perform similar reindex but with forward fill method specific for null values

a_ffill = a.reindex(range(20), method='ffill')
a_ffill



Output:

0 1.036439

1 1.036439

2 -0.841819

3 -0.841819

4 0.629621

5 0.629621

6 -1.905720

7 -1.905720

8 1.673387

9 1.673387

10 0.792506

11 0.792506

12 0.267104

13 0.267104

14 0.759571

15 0.759571

16 -0.847925

17 -0.847925

18 -0.598402

19 -0.598402

dtype: float64

Look at index 1, 3, and 5: values have been populated from the previous index.

For the complete list of parameters of reindexing method, refer to the documentation available at the following links:

Deleting entries

We often need to delete the data from the Pandas Series and DataFrame. You can do this using the drop() method, which is available to both Series and DataFrame. This method accepts the index, or the list of index, to be dropped from the Series and DataFrame.

This method creates a new object with only the required values. Note that this operation doesn’t perform inline-drop (i.e. the original Pandas Series or DataFrame will be preserved and still available after the drop operations). In practical terms, the method creates a selective copy of the data.

Deleting entries from Pandas Series

Let’s look at how to delete entries from a Pandas Series.

• Drop single index.

Code:

b = Series(np.arange(10), index=['a','b','c','d','e','f','g','h','i','j'])
b


Output:

a 0

b 1

c 2

d 3

e 4

f 5

g 6

h 7

I 8

j 9

dtype: int32

Code:

#Dropping index b

new_series = b.drop('b')
new_series


Output:

a 0

c 2

d 3

e 4

f 5

g 6

h 7

I 8

j 9

dtype: int32

• Drop multiple indexes.

Code:

# Dropping multiple index.
# for e.g., a, ge j
new_series_1 = b.drop(['a','g','j'])
new_series_1

Output:

b 1

c 2

d 3

e 4

f 5

h 7

I 8

dtype: int32

Deleting entries from Pandas DataFrame

In the case of DataFrame, we specify the index for both axes: row labels (by using index parameter) and column names (by using columns parameter).

The following code snippets demonstrate this behaviour:

• Removing a row from DataFrame.

Code:

df_states

Output:

Code:

df_states_noNT = df_states.drop('NT')
df_staes_noNT


Output:

Removing multiple columns from DataFrame by passing a sequence of column index and axis = 1.
Code:
~~~ python
df_states
~~~

Output:

Code:

df1 = df_states.drop(['state','area'], axis=1)
df1


Output:

Code:

df_states


Output:

References

1. Pandas Document for Series reindexing [Internet]. Pandas; [date unknown]. Available from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html
2. Pandas Document for Dataframe Reindexing [Internet]. Pandas; [date unknown]. Available from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html