Skip main navigation

From example data to prediction

Explanation of classification and regression problems.
Binary code
© AIProHealth Project

In History and Definition of AI, you learned that with supervised machine learning, learning takes place by training on example data or past experiences. This means that the type of output that can be predicted depends on the label of these example data.

In supervised learning, there are two types of predictions that can be made on your data, namely classification and regression. If the data you are training the machine on is accompanied by a discrete label, your prediction will be of the classification kind. For example, you can have a dataset of images that are either labeled “tumor” on “non-tumor”, “benign” or malignant”. Then the machine will train on classifying these images into either of these classes. Regression predictions can be made when the label is continuous (e.g., the degree of malignancy), in which the machine will attempt to estimate the value of the new data as closely as possible.


Classification can be used to categorize any type of data into predefined classes. For example, general practitioners (GPs) might have a list of information about their patients gathered during their consultations (e.g., are they able to smell, do they have a headache, what is their temperature, what is their blood pressure). They also labeled (positive or negative) this data for each patient, based on the outcome of their PCR test for COVID-19. Now, a machine can use this dataset to train and make predictions about new patients. Do they fall into the positive class (infected with COVID 19) or into the negative one (not infected)?

Similarly, classification can also be used for distinguishing between specific disease types, for example type 1 and type 2 diabetes. Features such as BMI and triglycerides can be used to train on this classification.

Another classic example of classification is determining what is contained in an image. For example, machines can be trained on determining whether an image contains either a cat or a dog. In healthcare settings, classification can be extended to results of medical imaging modalities like MRI and CT scans. One can make predictions of whether there are tumors in the slices or not, and if so, whether they are benign or malignant. There are examples of this already published (e.g., Esteva et al., 2017).

Another example of classification is characterizing whether feedback on a physician can be considered either positive or negative. This is done with the use of Natural Language Processing (NLP) in combination with classification supervised learning. NLP, simply put, consists of algorithms that allow for analyzing text data information. Research has already shown that machine learning can indeed be used with NLP to analyze feedback from patients on their physicians (e.g., Gibbons et al., 2017).

Next to just predicting the class of the data, one can also make use of a probabilistic classifier, which provides a probability distribution over a set of classes. How likely is it that the patient has COVID-19? How likely is it that this picture contains a benign tumor? Or how likely is this feedback on the physician positive rather than negative?


If the dataset in the healthcare you are working with is accompanied by a continuous label, regression can be applied. This can also be very useful in the interpretation and prediction of medical data. Suppose you have a dataset that consists of a lot of information about patients before their surgery. You know things like their age, gender, weight and what kind of surgery they will undergo. You have also stored the self-reported amount of pain these patients were in 24 hours after their surgery, on a scale from 0 to 10. Now, if new patients come in, you can use a model trained on this information to predict the amount of pain they will be in after their surgery. This can be useful to prepare for sufficient anesthetics.

In the case of medical imaging modalities, regression can also be used. What is the degree of malignancy of the tumor that is shown in the images, and what will be its rate of growth? The imaging data can also be used along with clinical parameters; for example to predict outcome parameters in radiotherapy treatment.

© AIProHealth Project
This article is from the free online

How Artificial Intelligence Can Support Healthcare

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education