Skip main navigation

Upper confidence bound action selection

In reinforcement learning, exploration is always needed as we are always uncertain regarding the action value estimates in terms of accuracy. Greedy actions involve immediate gratification, i.e. actions that look …

Course review

Congratulations on completing this course! Within the scope of this course, we have only been able to introduce some of the basic ideas of reinforcement learning. However, what we have …

Incremental action value methods

In the previous step, we used sample averages for the observed awards. This involves keeping track of this computation. [Q_t(a) = frac{R_1 + R_2+cdots+R_{N_t(a)}}{N_t(a)}] In terms of practical implementation, this …

The k-Armed Bandit problem

The k-armed bandit problem is a classic learning problem in reinforcement learning, describing a situation where the agent has a choice among k different options. The selection of each option …

Action value methods: an introduction

If we want to evaluate actions, we must estimate the values of these actions. We call these action evaluation and action selection estimates action value methods. The simplest action selection …

Elements of a reinforcement learning setting

Building upon our previous exploration of reinforcement learning examples, we will now delve deeper into the interaction between the agent and the environment. In this step, we are going to …

Evaluative versus instructive feedback

Building upon the elements highlighted in the previous steps, we can now delve deeper into the fundamental distinction between reinforcement learning and other learning paradigms. Their purpose is to prepare …

Reinforcement learning problem in context

Let us consider unsupervised and supervised learning techniques in the context of reinforcement learning. Supervised learning In supervised learning, we develop algorithms which learn from the experience of being exposed …

Introduction to reinforcement learning

What constitutes a learning environment? Your learning has come about through your interactions with the course material, your peers, online resources such as YouTube and Stackoverflow, books, and scientific publications. …

Recognising a reinforcement learning situation

In reinforcement learning, we develop algorithms which associate actions with situations with the goal of maximizing a reward signal. This reward signal, which is invariably numerical, is used as the …

Welcome to the course

In this video, your Lead Educator, Professor Tomás Ward, introduces you to the subject of Reinforcement Learning. As Tomás mentions, this is one of his favourite areas in machine learning …