Skip main navigation

Perceptron in scikit-learn

An worked example using the perceptron on the Iris data in Scikit-Learn.

An example using the perceptron model in scikit-learn.

The perceptron model

As we saw in the preceding article, this is just a set of one or more output nodes, each connected to a set of input nodes, without any hidden layers.

In a regression or a binary classification there will be just one output node, while in a multiclass classification there will be one output for each class.

There are as many input nodes as there are features in your dataset.

As a reminder, the mathematical definition of the perceptron is as follows:

[begin{equation} f(x) = begin{cases} 1 & text{if $w.x + b > 0$}\ 0 & text{otherwise} end{cases} end{equation}]

With (x) our vector of inputs, (w) the set of weights, and (b) the bias values.

Perceptron in scikit-learn

Scikit-learn has an implementation of the perceptron that can be used in a similar way to the other classification algorithms we have seen in the course. Here we will use the Iris data to demonstrate it.

By now you should be used to the way scikit-learn works and how we input and split the data, and then initialise, train and test the model. Try out the code below, noting where we select the perceptron model. We will just use the default version by calling the function Perceptron() without any input arguments.

# import the data
from sklearn.datasets import load_iris
iris = load_iris()
X =
y =

# split into training / validation
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

# initialise the model and train
from sklearn.linear_model import Perceptron
model = Perceptron(), ytrain)

# evaluate the fit
y_model = model.predict(Xtest)
from sklearn.metrics import accuracy_score
score = accuracy_score(ytest, y_model)

It’s likely you’ll see the model doesn’t perform as well as many of the other algorithms we’ve looked at in this course. That’s OK though, the idea here is to demonstrate how a single perceptron works, so when you come to consider more complex neural networks and deep learning approaches, you will understand more about the basic building blocks they are made of.

How does the perceptron make predictions?

Let’s dig a bit more into what exactly the perceptron is doing. First, we can grab model predictions for one example from each class:

print('predicted classes of X 0,50,100')
predicted classes of X 0,50,100
[0 1 2]

You should see that our trained model has correctly predicted the classes as 0,1 and 2 respectively. How is it making that prediction though? We can look at the fitted weights (w) and bias vector (b) stored in the attributes model.coef_ and model.intercept_ respectively:

print('model weights:')
print('bias / intercept weights')
model weights:
[[ 2.3 5.1 -8.6 -4.8]
[ 2.8 -33.1 5.4 -15.4]
[-40.8 -33.4 60.9 56.4]]

bias / intercept weights
[ 1. 6. -26.]

The weights are a 3 x 4 array, and the intercept is a vector of three values.

So why is this? Remember for every data instance x we need to calculate (w.x+b). We have three output classes and four input features and so the weights matrix is a 3 x 4 matrix. When we find the dot product of this matrix by a data instance with four features we end up with a vector of three numbers. This is then added to the bias vector to get the final result on which the prediction is based.

We can make this calculation ourselves on each of our three sample values using the dot function in NumPy:

import numpy as np
print('w.x+b for X[0,:]')
print('w.x+b for X[50,:]')
print('w.x+b for X[100,:]')
w.x+b for X[0,:]
[ 11.64 -35.21 -220.96]
w.x+b for X[50,:]
[-20.22 -18.12 -66.97]
w.x+b for X[100,:]
[-34.99 -53.7 69.41]

The three numbers in each case are the result for the particular data point for each of the three output classes 0,1 and 2.

The test for each output node in a perceptron is whether or not (w.x+b>0). If it is then it outputs 1, and if not it outputs 0. So in the first and third example this is clear. The value is only greater than zero for the 0 and 2 class respectively.

In the second example however, all the numbers are less than zero. Most likely this is because this particular example is difficult to classify and lies near the decision boundary between two classes. However, a decision of one of the three classes must be made, and so the class with (w.x+b>0) nearest to zero is chosen, in this case class 1.

This article is from the free online

Machine Learning for Image Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now