New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

# Visualisation in Python

A brief guide to data and model visualisation in Python.

As we’ve seen, good data visualisation is an important tool in your analysis, and for communicating your results.

This article gives a few practical tips for data and model visualisation in Python.

## Matplotlib

There are lots of useful tools for this in Python, not least Matplotlib as we saw in week 1. Matplotlib has many many more options and plot types than we have time to discuss here. For a good overview see the gallery of Matplotlib examples on their website.

## Confusion matrices

We’ve seen in the preceding videos that confusion matrices are useful ways to visualise the performance of machine learning classifiers.

But how do we make them in Python? One way is to use scikit-learn and Matplotlib. Scikit-learn has two useful functions:

• confusion_matrix, which makes the matrix itself
• ConfusionMatrixDisplay which uses Matplotlib to plot and display it.

To demonstrate we can go back to our random forest example using the Iris data:

# Load the datafrom sklearn.datasets import load_irisiris = load_iris()X = iris.datay = iris.target# Split in to training and test setsfrom sklearn.model_selection import train_test_splitXtrain, Xtest, y_train, y_test = train_test_split(X, y)# Train the random forest and make predictions on the test setfrom sklearn.ensemble import RandomForestClassifiermodel=RandomForestClassifier(n_estimators=1000,random_state=13)model.fit(Xtrain,y_train)y_predicted = model.predict(Xtest)

Now we have a set of predictions for the test data, we can compare with the true results and make a confusion matrix:

from sklearn.metrics import confusion_matriximport matplotlib.pyplot as pltcm = confusion_matrix(y_test,y_predicted)print(cm)
[[15 0 0] [ 0 12 1] [ 0 0 10]]

As you see from the print out, the function confusion_matrix just gets the numbers for the matrix, it doesn’t make a plot.

To do that we need to use ConfusionMatrixDisplay as follows:

from sklearn.metrics import ConfusionMatrixDisplayimport matplotlib.pyplot as pltdisp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=model.classes_)disp.plot()plt.show()

## Decision boundary

Another nice way to visualise the output of a random forest classifier (and other types of classifiers) is to plot the decision boundary. We can do this in scikit-learn but we first need to cut the features in the dataset down to two dimensions so we can plot it.

After training the reduced dataset, the next step is to evaluate the predictions of the model on a grid of appropriate values, which can then be plotted by colouring the background according to the predicted classes in 2D space.

The code below demonstrates how to do this. Try it out for yourself and try and figure out what it is doing using the comments.

import numpy as npimport matplotlib.pyplot as plt# load the data and select two features onlyfrom sklearn.datasets import load_irisiris = load_iris()sdata = iris.data[:,[0,3]] # we have picked sepal length (0) and petal width (3)# make a grid of values for predicting and plotting based on the dataxmax = 8.2xmin = 3.8ymin = -0.2ymax = 2.8feature_1, feature_2 = np.meshgrid( np.linspace(xmin, xmax), np.linspace(ymin, ymax))grid = np.vstack([feature_1.ravel(), feature_2.ravel()]).T# set up the model, train it, then make predictions on the grid of valuesfrom sklearn.ensemble import RandomForestClassifiermodel=RandomForestClassifier(n_estimators=1000,random_state=13)forest_model=model.fit(sdata, iris.target)y_pred = np.reshape(forest_model.predict(grid), feature_1.shape)# make and display the plotfrom sklearn.inspection import DecisionBoundaryDisplay #NOTE NEW IN VERSION 1.1display = DecisionBoundaryDisplay(xx0=feature_1, xx1=feature_2, response=y_pred)display.plot()display.ax_.scatter(sdata[:, 0], sdata[:, 1], c=iris.target, edgecolor="black")plt.xlabel('sepal length (cm)')plt.ylabel('petal width (cm)')plt.xlim([xmin,xmax])plt.ylim([ymin,ymax])plt.show()

The plot above shows the output. The original data points and the background have been coloured in the same way so you can see clearly where the model would misclassify data points.

## Seaborn

Though you can achieve a lot with Matplotlib, some might find the aesthetic appeal of the default plots a bit basic, and to customise them can involve a bit of tedious extra coding. This is especially likely to be true if you are used to using the statistical programming language R, or RStudio.

Seaborn is an additional package that works on top of Matplotlib and tries to both simplify the user interface and improve the appearance of standard Matplotlib plots. A particular strength of Seaborn vs Matplotlib is that it can easily handle data stored in Pandas ‘dataframes’ format, in a similar way to R.

You can think of Matplotlib as a plugin to Python, and Seaborn as a plugin to Matplotlib. You might already have it installed along with Anaconda, but as usual installing it is easy:

 pip install seaborn

You can also look up more detailed installation instructions from the library webpages themselves.

Since Seaborn is based on Matplotlib, the code to produce a simple plot can be identical, but the resulting plot appears different. For example using Matplotlib:

import matplotlib.pyplot as pltimport numpy as np x = np.linspace(-5, 5, 1000)y1 = np.cos(x)y2 = np.sin(x)plt.plot(x, y1,label='cos')plt.plot(x, y2,label='sin')plt.legend()plt.show()plt.close()

Using Seaborn, we just need to import the library and then use the set function, and the default plotting behaviour will be changed.

import seabornseaborn.set()x = np.linspace(-5, 5, 1000)y1 = np.cos(x)y2 = np.sin(x)plt.plot(x, y1,label='cos')plt.plot(x, y2,label='sin')plt.legend()plt.show()plt.close()

Of course, its a matter of opinion whether this default style change is an improvement, but Seaborn has other styles you can use, and if you want you can even define your own plot styles.

As with Matplotlib, there’s lots more you can do with Seaborn than we have time to go into here, so to learn more see the Seaborn example gallery.