Skip main navigation

Generalisation, bias and overfitting

What happens when a machine learning system encounters new data that it wasn't trained on? Dr Katrina Attwood explains more
We’ve seen that a machine learning system uses training data to learn a function that maps inputs to outputs. The performance of the system is measured by a loss function. During the training process we adjust the parameters of our machine. The loss over our training data will gradually reduce as the system learns a good approximation of the desired input-to-output function. We call this graph a learning curve. However, the system has been trained on a particular set of data. The optimisation process to reduce the loss was only doing so with respect to that specific training data. What if the machine is now presented with an input it has never seen before?
When we evaluate our system with data it has never seen before (we call this test data), we are asking how well it generalises. Since this new data is likely to be different to our training data in some way, the performance will generally be worse. The difference in performance between training and test data is called the generalisation error. If our generalisation error is too high, we might want to make our machine more powerful to help it learn, and hopefully generalise, better. However, if we give our machine too much power to explain the training data it can cause problems.
If we have relatively little training data but give our machine lots of parameters, it can fit the training data very well but generalise badly. For example, imagine some training data where the relationship between input and output is approximately a straight line. If we allowed our machine to use a high order polynomial as its function, it could fit the points exactly. i.e. the training loss would be zero. However, we can see that this function will now make some extreme, and probably very bad, predictions in between the training data. This problem is called overfitting. We can diagnose this in our learning curve when the training loss quickly goes to zero while the generalisation error starts to increase the more we train.
Effectively, the system is just memorising the right output for each training input but learns nothing about solving the problem in general. So, we have to trade off making our machine more powerful against overfitting our training data. We can improve matters by increasing the size of our training dataset. This makes it harder to overfit. However, we must be careful. Both our training and test data must be sufficiently diverse. In other words, it must be a representative sample of all the data we expect to encounter. If our dataset is biased we will get a false impression of the performance of our system and the decisions it makes will inherit this bias.

A machine learning system uses training data to learn a function that maps inputs to outputs. But will the system produce sensible output when presented with input it has never seen before?

Dr Katrina Attwood explains more.

This article is from the free online

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education