Skip main navigation

Supervised, Unsupervised and Semi-Supervised Learning

A short summary of the difference between supervised unsupervised, and semi-supervised learning problems

One of the key concepts in machine learning we addressed in the previous videos is the distinction between supervised and unsupervised learning. Here’s a summary and reminder.

Supervised learning

In supervised learning there is some known key output or (target value) that we wish to predict based on a given set of input features in our data. The aim of training a supervised learning model is to make a mapping from the input features to the target values.

If the target value is a categorical value, such as a plant species, or a yes / no answer (e.g., does this image contain a flower), then the problem is referred to as a classification problem. The features used to make this prediction might be the dimensions of some part of a plant, or it could be image pixel values, or some other property derived from an image, among many other things.

Supervised machine learning – we know the classes of all the training data
If instead of a categorical value, you are looking to predict a continuous value, such as the yield of a crop given a set of environmental conditions, or the price of house given information about its size, location, etc, the problem is referred to as a regression problem.
Which machine learning model you use will depend in part on whether your problem is regression or classification.

Unsupervised learning

In contrast to supervised learning, with unsupervised learning there is no known target value, and so rather than using the data to predict a target value, we are instead interested in finding patterns or structures within the data.
A common example of unsupervised learning is clustering. With clustering, the aim is to sub-divide the data into distinct groups, without necessarily knowing what the different groups represent.
Unsupervised machine learning – we don’t know the class of any of the training data, but look for structure or clusters
In the supervised learning example we used before where dimensions of parts of plants are used to predict the species, what if we had that same dimension data that we suspected might belong to different species, but didn’t know the species names?
In this case we might still be able to use an unsupervised learning approach to divide the data into distinct clusters. Though we don’t have species names, we could still use the model to say which data-points are most like one-another.

Semi-supervised learning

Often, we have datasets where we have knowledge of the target value for just some of the data-points. For example, we may have recorded many thousands of images of some object of interest, but only have the time and expertise to annotate a small subset of them.
Semi-supervised machine learning – we only know the class of some of the training data

In this instance we might consider semi-supervised learning, in which the techniques of unsupervised learning could be used to cluster the dataset, with the labelled examples found in each cluster then used to identify the class of each cluster.

Alternatively, we could use active learning in which a supervised learning model is trained using the labelled examples, and then used to identify unlabelled examples that if labelled could help the model’s accuracy. Once labelled with some human input, these examples can then be added to repeated training runs.

This article is from the free online

Machine Learning for Image Data

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now