Learn more about this course.

Identification of requirements

Identification of requirements relevant to the activities.

Main Requirements

The main requirements in these case studies include the use of some specific Python libraries. More specifically

NLTK
Seaborn

NLTK stands for Natural Language Toolkit and it provides an interface to vast lexical resources, as well as text processing tools for classification, tokenisation, tagging, parsing, etc. Let’s look at what they mean in more details

Want to keep
learning?

This content is taken from
Edge Hill University online course,

Introduction to Python for Big Data Analytics

View Course

Activity

Open a new Jupyter notebook

Type the following

import nltk

sentence = "Big Data Analytics is an emergent and significant scientific field, which will drive innovation."

At this stage, all we have is a string which has been stored in the variable sentence.
However, we need to specify to the Python interpreter that the string consists of units (or tokens), which are equivalent to words

tokens = nltk.word_tokenize(sentence)
print(tokens)

What do you see?

The next step is to attach a lexical tag to each of those words.
These tags represent the corresponding lexical properties, for example

NN: noun, common, singular
NNP: noun, proper, singular
NNS: noun, common, plural
VB: verb, base form, etc.

Run this code

tags = nltk.pos_tag(tokens)
print(tags)

What do you see?
Search for the different tags and identify what they refer to

What other commands does NLTK have?
Spend a few minutes familiarising yourself with the library

Visualisation via Matplotlib and Seaborn

Any visualisation approach within Data Analytics, needs to address the following points

Who is your audience?
What is the story you want to tell?
How to present it to optimise the message?

In this course, we will use matplotlib and Seaborn, which are libraries specifically designed for statistical graphics in Python. Seaborn is based on matplotlib and integrates closely with pandas data structures.

One of the strongest features of Seaborn is that it allows you to explore and better understand your data. It easily plots pandas dataframes and arrays containing whole datasets by automatically perform the necessary pre-processing and statistical aggregation stages to display informative graphs.

Let’s look at some examples.
Open a Jupyter notebook and type the following


import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots() # Create a figure containing a single axes.
ax.plot([1, 2, 3, 4], [-1, 2, -2, 4]); # Plot some data on the axes.

What can you see?

Experiment with other graphs and functions, which are part of matplotlib. Explore scatter plots and try different examples.

Now, let’s consider seaborn. Again, open a Jupyter notebook and type the following

# Import seaborn
import seaborn as sns

# Apply the default theme
sns.set_theme()

# Load an example dataset
tips = sns.load_dataset("tips")

# Let’s plot
sns.relplot(
 data=tips,
 x="Total Bill", y="Tip", col="time",
 hue="smoker", style="smoker", size="size",
)

Understand what the different parameters do.

What does the relplot() function do?
What do ‘hue’ and ‘style’ do?
What is ‘sns.set_theme()’ ?
Experiment and try different graphs based on the above dataset

Want to keep learning?

This content is taken from Edge Hill University online course

Introduction to Python for Big Data Analytics

View Course

See other articles from this course

This article is from the free online

Introduction to Python for Big Data Analytics

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Identification of requirements

Main Requirements

Want to keep
learning?

Introduction to Python for Big Data Analytics

Activity

Visualisation via Matplotlib and Seaborn

Want to keep learning?

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Identification of requirements

Main Requirements

Want to keep learning?

Introduction to Python for Big Data Analytics

Activity

Visualisation via Matplotlib and Seaborn

Want to keep learning?

Introduction to Python for Big Data Analytics

Share this

Introduction to Python for Big Data Analytics

Introduction to Python for Big Data Analytics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?