Skip main navigation

Indexing, selecting, and filtering in Pandas

Article discussing indexing, selecting, and filtering in Pandas.

At times, looking at just one particular subset of data might be beneficial for data analysts to draw conclusions or insights. In such cases, Pandas allows for indexing, selecting, and filtering data to support data analysts to access a particular section of the data set that is relevant and required.

Now, let’s see how to index, select, and filter data in Pandas Series and DataFrames.

Indexing, selecting, and filtering data in Pandas Series

Indexing for Pandas Series uses a labelled index along with the implicit positional index. Following are some of the examples of this behaviour:

Code:

ob1 = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
ob1

Output:

a 1.408564

b 0.073118

c -0.261970

d 1.749842

e -0.156697

f -1.444552

g 0.463587

h -0.236253

i -1.641489

j 0.194287

dtype: float64

Code:

ob1[0], ob1['a']

Output:

(1.4085644861854276, 1.4085644861854276)

Slicing gives you a subset of data required for analysis. When slicing with labels, it is essential to remember that both column and row indexes are included in the subset.

The following are some of the examples of this behaviour:

Code:

ob1 = Series(np.random.randn(10), index=['a','b','c','d','e','f','g','h','i','j'])
ob1

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

f 0.661464

g 0.803230

h -0.346606

i -0.623374

j 1.193851

dtype= float64

Code:

ob1[5], ob1['f']

Output:

(0.661464345493067, 0.661464345493067)

Code:

ob1[0:5]

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

dtype= float64

Code:

ob1['a':'f']

Output:

a 0.972519

b -0.301196

c -0.263640

d -0.039523

e -1.145125

f 0.661464

dtype= float64

Indexing, selecting, and filtering data in Pandas DataFrames

As we have already seen, we use indexing to retrieve a particular subset of data along the x- and y-axis of DataFrame by passing either the single value or sequence of indexes.
The following examples demonstrate these features again:

Code:

df_states

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of checking a data frame.Click to enlarge image

Code:

df_states['state']

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of selecting multiple columns. The examples used are state names and abbreviations of state names.Click to enlarge image

Code:

df_states[['state', 'TZ']]

Output:

Screenshot from the Jupyter Notebook. Screenshot shows example of selecting multiple columns. The examples used are state names and abbreviations of state names.Click to enlarge image

Next, let’s learn how to align, map, and sort data in Pandas.

This article is from the free online

Introduction to Data Analytics with Python

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education