Skip main navigation

Training of a machine learning model

In this step the training and testing of an AI model is explained.
Three metal dumbbells on the floor in gym
© Storyblocks

In machine learning, a model is trained by learning from examples. Based on this, the machine learns how to make inferences on new data. But how does this training of a machine learning model work?

Training and testing

The first step in the training of a machine learning model is the division of the dataset. That is, the original dataset has to be divided into two sets:

  1. a training set
  2. a testing set

As the name implies, the training set will be used to train the machine learning model. After the training of the model, the test set will be used to determine how well it works.

During the training phase, the examples from the training set will be given as input to the algorithm. For each individual example, the algorithm will then try to make a prediction. In the case of supervised learning, the algorithm would try to predict its accompanying label (e.g., “benign” or “malignant”). Since this is still the training phase, the actual label can be used as feedback to improve the algorithm’s predictions. This works by computing the loss, which is the difference between the prediction and the provided label.

The model goes over the example data again and again. One pass of the entire training data set through the model is called an epoch. With each epoch, the model improves its predictions by trying to minimize the loss, until so-called convergence is reached. In this case, the algorithm’s predictions on the example data are not changing significantly anymore: for example, when the algorithm keeps on classifying the same tumors as being malignant.

After this part, the resulting model can be used to make predictions on the test data. This is the testing phase. Different evaluation metrics can be used to determine how well the algorithm works. The simplest type of metric is the accuracy. A more advanced one is the F1-score, which weights the precision (fraction of actual positives marked by the algorithm among all instances marked positive by the algorithm) and recall (also known as sensitivity, fraction of actual positives marked by the algorithm among all actual positives) of the algorithm. Another example is Cohen’s Kappa score). All of these evaluation metrics are calculated with the use of a confusion matrix, which is a table that compares the predictions of the algorithm with the true labels of the test data.


Another very common option is to use part of the training set as a validation set. This is a ‘fake’ test set which can be used on the machine learning algorithm to see how well it works under different hyperparameters, which are basically the settings of the algorithm. Using this validation set also allows for testing which data features are relevant and which can be left out. Once the hyperparameters and features have been selected, the final model is determined and then the test data can again be used to measure the performance of this model.

It is also possible to use the entire training set for validation with the use of cross-validation, in which you constantly choose a new part of the training set to validate your algorithm on.

Training of a neural network

Since most of the influential AI work that has been performed in the field of healthcare is based on deep learning, we will take a closer look at how the training of deep learning algorithms takes place.

In deep learning, we make use of artificial neural networks, which are inspired by the human brain. Natural intelligence is based on neurons constantly interchanging bits of information, and it is influenced by the strength of the connections between these neurons. Over a lifetime, these connections are constantly adapted and thus influence your perception, knowledge, and skills.

The same concept is used in artificial neural networks. As you can see in the image below, a neural network starts with an input layer, which functions as the senses of the network (e.g., sight or hearing). During the training phase, this layer receives information about the examples in the training data one by one. These examples can be in the form of a table row (i.e., the features of the example indicated by numerical or categorical values), an image (e.g., CT scan), soundwave (spoken text by a surgeon), or any other type of information we would be able to perceive as human beings. This input layer, which receives these examples, is connected to more hidden layers of neural networks by means of so-called weights. These weights represent the strength of the connection between the different neurons. Based on the values by which the input data is represented and the value of these weights, the algorithm makes a prediction of what is seen in the input layer (e.g., “benign” or “malignant”). These predictions are called the output, and basically represent the interpretation of the input. By looping over the examples in the training data, these weights in the network are constantly updated, until the model can make correct predictions of what is seen in the input layer.

With the validation set, hyperparameters such as depth of the network (i.e., the number of hidden layers) or the types of connections between the layers can be decided upon. Once this is completely done, the algorithm can be trained definitively, and the test set can be used to see how well the trained network performs.

© AIProHealth Project
This article is from the free online

How Artificial Intelligence Can Support Healthcare

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now