Skip main navigation

Indexing, selecting, and filtering in Pandas

Article discussing indexing, selecting, and filtering in Pandas.

At times, looking at just one particular subset of data might be beneficial for data analysts to draw conclusions or insights. In such cases, Pandas allows for indexing, selecting, and filtering data to support data analysts to access a particular section of the data set that is relevant and required.

Now, let’s see how to index, select, and filter data in Pandas Series and DataFrames.

Indexing, selecting, and filtering data in Pandas Series

Indexing for Pandas Series uses a labelled index along with the implicit positional index. Following are some of the examples of this behaviour:

Code:

ob1 = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
ob1

Output:

a 1.408564

b 0.073118

c -0.261970

d 1.749842

e -0.156697

f -1.444552

g 0.463587

h -0.236253

i -1.641489

j 0.194287

dtype: float64

Code:

ob1[0], ob1['a']

Output:

(1.4085644861854276, 1.4085644861854276)

Slicing gives you a subset of data required for analysis. When slicing with labels, it is essential to remember that both column and row indexes are included in the subset.

The following are some of the examples of this behaviour:

Code:

ob1 = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
ob1

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

f 0.661464

g 0.803230

h -0.346606

i -0.623374

j 1.193851

dtype= float64

Code:

ob1[5], ob1['f']

Output:

(0.661464345493067, 0.661464345493067)

Code:

ob1[0:5]

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

dtype= float64

Code:

ob1['a':'f']

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

f 0.661464

dtype= float64

Indexing, selecting, and filtering data in Pandas DataFrames

As we have already seen, we use indexing to retrieve a particular subset of data along the x- and y-axis of DataFrame by passing either the single value or sequence of indexes.
The following examples demonstrate these features again:

Code:

df_states

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of checking a data frame.Click to enlarge image

Code:

df_states['state']

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of selecting multiple columns. The examples used are state names and abbreviations of state names.Click to enlarge image

Code:

df_states[['state', 'TZ']]

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of selecting multiple columns. The examples used are state names and abbreviations of state names.Click to enlarge image

Next, let’s learn how to align, map, and sort data in Pandas.

This article is from the free online

Introduction to Data Analytics with Python

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now