Skip main navigation

Data division

A video discussind dividing data into training, validation and testing sets.

The important types of dataset split

In this video, we consider dividing data into training, validation and testing sets. How you do this can have an impact on the performance of a machine learning system.

The training data is used to update the machine learning model during the training process. The validation data set is a separate set of images, and is used to monitor performance of the model on unseen images throughout the training process. The testing dataset is an unseen set of images, which is used to finally test the performance of the model on a “new” set of images once trained.

You can find out much more about these dataset concepts in the Machine Learning for Image Data course below.

This article is from the free online

Experimental Design for Machine Learning

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now