Skip main navigation

Reinforcement learning problem in context

Surfer rides the wave on a sunny day.
© Shutterstock

Let us consider unsupervised and supervised learning techniques in the context of reinforcement learning.

Supervised learning

In supervised learning, we develop algorithms which learn from the experience of being exposed to training examples which have been labelled with their ground truth by some third party process, whose abilities we are seeking to emulate in our algorithm.

In reinforcement learning scenarios, we could provide our agent with a machine learning algorithm trained on various examples of environmental states, each paired with the corresponding optimal action.

It is hoped that such an approach would ensure that the machine learning algorithm would produce good generalization performance. Thus, when confronted with an unseen (i.e. not present in the training data) environmental state, the agent would label it with an action to subsequently carry out and which would be appropriate in terms of advancing towards the goal.

Let’s take the simple scenario of our surfer and consider whether it would be practical to provide sufficient data on every possible state along with the appropriate action? Maybe, but this is certainly not an elegant solution, and likely impractical. A more robust solution is for the agent to be capable of independent learning through trial and error.

Unsupervised learning

In unsupervised learning, we consider algorithms where examples do not contain the target feature, i.e. the appropriate action.

This idea of such algorithms being unsupervised suggests we might consider reinforcement learning as a form of unsupervised learning. Perhaps, but it is quite distinct from unsupervised methods we have seen which seek to find structure in data. Reinforcement learning is not intended to do this. Instead, it is designed to maximise a reward signal. Sutton and Barto (2018) consider it rather as a third type of machine learning and I tend to agree with this distinction. They also note that a distinctive challenge in reinforcement learning, which is not present in traditional supervised/unsupervised learning, is the balance required between exploration and exploitation. A successful reinforcement learning agent will exploit what it already knows for rewards, but it cannot do so exclusively as it also needs to explore in order to discover better actions to take in the future.

Furthermore, in tasks containing a random element, some actions may need to be tried out several times in a particular situation in order to better estimate the expected reward. Therefore, balancing exploring vs exploitation strategies is an important issue in reinforcement learning, which is not seen elsewhere.

There are other arguments and much discussion around the distinctive and unique capabilities of a reinforcement learning approach to artificial intelligence. However, these are beyond the scope of this brief introduction.


Sutton, R. and Barto, A. (2018) Reinforcement Learning: An Introduction, 2nd ed., Cambridge, MA: MIT Press. Available here.

© Dublin City University
This article is from the free online

Reinforcement Learning

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now