Skip main navigation

Course review

Course review
Creative millennial team reviewing.
© Shutterstock

Congratulations on completing this course!

Within the scope of this course, we have only been able to introduce some of the basic ideas of reinforcement learning. However, what we have examined, we have done so quite thoroughly.

The most important concept we explored was the idea of balancing exploitation and exploration. Everything we looked at was about this essential problem. We saw the (varepsilon)-greedy methods spend a fraction of decisions on random selections of actions (exploring) while the upper confidence bound (UCB) method chose the perceived maximum value but with that “perception” shaped by how often each action had been selected.

Which method is best? In these k-armed bandit problems, the UCB method generally performs better, although its effectiveness depends on various parameters. It should be pointed out that we only looked at non-associative tasks here. The associative form of this task would be if we had multiple different k-armed bandits (each one needs to be mastered in terms of preferred actions as you would have different payouts for example). We would then need to associate different actions for different situations (i.e. which k-armed bandit). We can do this by having some sort of policy to recognise which k-armed bandit we are encountering and then deploy our best available actions for that particular k-armed bandit. This is beginning to look like a full reinforcement learning problem.

Finally, we should highlight that we have not looked at the full reinforcement learning problem. For general reinforcement learning, you need to dive into the area of Markov decision processes, which we do not have the scope for within this course. However, I do recommend you do this if you really want to understand how reinforcement learning can be used for amazing things in areas as diverse as robots teaching themselves to walk, through to modelling human decision making. However, aside from that deep dive, I hope you have learned something about reinforcement learning and are eager to go further.

© Dublin City University
This article is from the free online

Reinforcement Learning

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now