Skip main navigation

Examples of problems suitable for reinforcement earning

x
Ended game Go board view from above.
© Shutterstock

Examples of reinforcement learning in action are useful for understanding the best context for application and recognition of the approach. Let’s take a look at the following examples.

Playing Go

An expert Go player places a stone. They choose a position on the board based on:

  • The state of the board (i.e. their opponent’s and their own existing stones),
  • Their experience,
  • Their knowledge of how their opponent generally plays and, of course,
  • Their desire to reach a particular end (i.e. the goal of winning the game).

Riding a bicycle for the first time

Riding a bicycle for the first time. You push off, you wobble, you correct, you know what you want to do and what your goal is, i.e. not fall over and move forward!. You repeat this pattern over and over. After a while, you are able to pedal and maintain the bike’s balance. Happy days! This is reinforcement learning in action.

You might encounter challenges with coding, machine learning algorithms, and understanding the problem. You try an algorithm, hope you have implemented it correctly, try it on your data, look at the results, reflect, and then try again.

Important common elements

All the problems above involve interaction between the agent, which is taking the decision, and the environment. It is in this environment that the agent is seeking to attain its goal. The agent strives to accomplish this goal despite uncertainty about the very environment it is interacting with. An additional complexity common in real-world problems is that the agent’s actions affect the state of the environment so that at the next decision point, the environment state is not as it was. This leads to different contexts for possible actions and of course new possibilities for exploitation.

It is important to consider that when I say “affect the environmental state” this could mean something as benign as for example, teaching a robot to walk, moving to a new location in the world which has slightly different characteristics (an increased slope, rough ground, etc). In this scenario, the world may still be static but the robot has moved to a new part of it and therefore the “environment” as experienced by the robot has now changed. Using this example, you can easily conclude that a good decision is based not only on the immediate consequence of an action in the next moment of experience but more significantly upon the longer playout of that and subsequent actions. For example, you don’t want your robot walking towards more challenging terrain which will make mastering the basics of locomotion exponentially more challenging.

What makes real-world reinforcement learning problems particularly challenging is that there is always some uncertainty. Therefore the impact of actions in terms of driving the reward signal is always a “work in progress”. Our agent must keep monitoring its environmental state, keep a record of associated actions and keep progressing towards its goal. By doing this our agent accumulates experience which allows it to take better actions in the future, thereby effectively allowing it to progress towards its goal.

What do you think?

Take this opportunity to think about some more examples of reinforcement learning in action.

Share and discuss your responses with other learners in the comments section.

© Dublin City University
This article is from the free online

Reinforcement Learning

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now