Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Logistic regression

An article giving a brief overview of logistic regession.
A set of axes labelled P(X) against X with an S-shaped curve plotted going from low P(X) to high P(X), centred on the value mu.

Here we will have quick look at a specific type of regression concerned with the probability of an event occurring, known as logistic regression.

What is logistic regression?

As we have seen in the preceding videos and articles, regression is concerned with predicting a continuous value in a range, such as the width of a petal, or the yield of a crop. In the case of linear regression this predicted value can, mathematically at least if not in reality, take any value, even negative values.

But what if we were instead interested in a probability of some event occurring as a result of some input variable?

When we talk about events this could mean many different things, for example:

  • What is the probability that this flower is of a particular species?
  • What is the probability that this image contains a tomato?
  • What is the probability that this email is spam?

In the simplest case logistic regression looks at the value of one variable, such as the width of petals, or the number of red pixels in an image, and outputs a probability based on the value of that variable. You can think of this as the regression equivalent of binary classification, but rather than the output being 1 or 0, or true or false, logistic regression instead gives the probability that the event occurs.

In the examples above, we talked about using a single variable, but multiple features can also be used as inputs in logisitic regression.

How does logistic regression work?

In linear regression, crudely speaking the aim is to fit a straight line through the data, and so we can use the equation of a line and find the parameters required to make it the best fit line.

The idea is similar for logistic regression, but the function we need is a bit more complicated. We know we want a probability, so the output needs to be restricted between 0 and 1. We also know there’s likely to be a decision boundary somewhere where the probability of the event happening moves from unlikely (<0.5 probability), to likely (>0.5 probability).

To capture this behaviour, we use what’s known as the logistic function:

[p(x) = frac{1}{1+ e^{-(x-mu)/s}}]

Where (x) is our variable of interest, and (mu) and (s) are parameters.

Don’t worry too much about the maths here, the important part is what the curve looks like.

The logistic function showing the probability of some event against some variable (x).

As you can see in the plot above, we have a roughly S-shaped curve ranging from near zero at low values, to near one for high values. The midpoint of the curve is defined by the parameter (mu), while the other parameter (s) defines the scale over which the probability flips from low to high, or the steepness of the curve near the midpoint.

While the logistic function is more complex than that of a straight line as in linear regression, there are still just two parameters. We can then fit those parameters, by training on our data in a similar way as in the other methods we have looked at previously. The important thing to note is that while the logistic regression model outputs a continuous probability, the values of the input target data for training are just binary classification data, i.e. does the event in question for this example happen or not.

Example in scikit-learn

To illustrate logistic regession we can use a simple example with the Iris data in Scikit-Learn.

First we will load the Iris data as normal:

from sklearn.datasets import load_iris

iris = load_iris()

X = iris.data
y = iris.target

To make things simpler we will just take the first two classes, so that each flower will either be versicolor (y=1) or not versicolor (y=0). We can do this quickly by just picking the data where y<2:

X = X[y<2,:]
y = y[y<2]

Now we can split the data as usual:

from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y,random_state=13)

Then import, initialise, and train the logistic regression model in the usual way:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

model.fit(Xtrain,ytrain)

To look at the predictions on the test data, we can use the predict_proba function.

y_model = model.predict_proba(Xtest)

print(y_model[:4,:])
[[0.98255379 0.01744621]
[0.01691956 0.98308044]
[0.00139447 0.99860553]
[0.98654425 0.01345575]]

This gives the probability that each data example belongs to each class. You should see that each row adds up to one, and also that a pretty clear-cut prediction has been made in each of the four examples printed out here.

This article is from the free online

Machine Learning for Image Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now