Learn more about this course.

Introduction to Data Analytics libraries

Pandas

Pandas is a Python package providing fast, flexible, and expressive data structures designed to work with relational or labelled data. It is a fundamental high-level building block for doing practical, real-world data analysis in Python.

Pandas is well suited for:

Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
Ordered and unordered (not necessarily fixed-frequency) time-series data
Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
Any other form of observational/statistical data sets. The data actually need not be labelled at all to be placed into a pandas data structure

Want to keep
learning?

This content is taken from
Edge Hill University online course,

Introduction to Python for Big Data Analytics

View Course

Key features:

Easy handling of missing data
Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labelling of axes
Robust IO tools for loading data from flat files, Excel files, databases, and HDF5
Time-series functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

Pandas Data Structures: Series

A series is a single vector of data (like a NumPy array) with an index that labels each element in the vector. If an index is not specified, a default sequence of integers is assigned as the index. A NumPy array comprises the values of the series, while the index is a pandas Index object.

For example

import pandas as pd
counts = pd.Series([632, 1638, 569, 115])

0 632
1 1638
2 569
3 115
dtype: int64

counts.values
array([632, 1638, 569, 115])

Pandas Data Structures: DataFrame

A DataFrame is a tabular data structure, encapsulating multiple series like columns in a spreadsheet. Data are stored internally as a 2-dimensional object, but the DataFrame allows us to represent and manipulate higher-dimensional data.

See, for example the following picture depicting a dataframe extracted from a csv file.

Pandas Dataframe. This image depicts how to import and use the Pandas Python library.

Pandas Dataframe. This image depicts some examples of the Pandas Python library.

Try the following code


# import pandas as pd
import pandas as pd
 
input_users = {'Name':['Sarah', 'Lucas', 'Debbie', 'Joanna'],
 'Age':[41, 51, 87, 69]}
 
df = pd.DataFrame(input_users)
print(df)

What does it do? Explore the different components and try different examples.

Want to keep learning?

This content is taken from Edge Hill University online course

Introduction to Python for Big Data Analytics

View Course

See other articles from this course

This article is from the free online

Introduction to Python for Big Data Analytics

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Introduction to Data Analytics libraries

Pandas

Want to keep
learning?

Introduction to Python for Big Data Analytics

Pandas Data Structures: Series

Pandas Data Structures: DataFrame

Want to keep learning?

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Introduction to Data Analytics libraries

Pandas

Want to keep learning?

Introduction to Python for Big Data Analytics

Pandas Data Structures: Series

Pandas Data Structures: DataFrame

Want to keep learning?

Introduction to Python for Big Data Analytics

Share this

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?