Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

Deep Reinforcement Learning: Part2

v
14.1
The most fundamental problem of RL is the “Exploration vs. Exploitation problem”. In the beginning of a learning, we need to explore all possibilities and find the actions that lead to highest rewards. Once we have gathered enough information of our environment, we should start exploit to optimize our policy model. If we keep doing exploration, then we may lose chances to find best policy. The most common algorithm to balance between exploration and exploitation is the epsilon-greedy policy, which has ε chance to do random actions for exploration, or (1- ε) chance to performs greedy action for exploitation. Now I am going to introduce Q-learning, which is common reinforcement learning algorithm.
72
Q-learning is value-based RL algorithm, which aims to learn a Q-function that can evaluate the quality of each action/state pair. First, let’s define the future reward capital R_t, which is the sum of future reward from r_t to r_(t+n). Because the future reward is uncertain, we add a discount factor gamma to reduce the future rewards. The formula of capital R_t can be rewritten as r_t + R_(t+1) multiplied by gamma Our goal is to maximize the discounted future reward, which is called Bellman equation Finally, we employ the epsilon greedy and add a learning alpha to switch between exploration and exploitation. Traditionally, the Q function was calculated by using table or dynamic programming, which cannot be scaled for large and complex environments.
136.3
DeepMind proposed to use neural network to approximate the Q-table and proposed the Deep Q Network (DQN). This approach was considered as infeasible because neural network and reinforcement learning are both unstable and hard to train.
155.1
DeepMind shows the effectiveness of DQN by introducing two tricks: Experience replay and target network. Additionally, DeepMind uses only raw pixels as input, and generalize DQN to many different environments.

Continung on explaining deep reinforcement learning, Prof. Lai first talks about the fundamental problem of Reinforcement Learning, exploration versus exploitation problem.

Then, he tells the Q-learning, which is a common reinforcement learning algorithm. Q-learning is a value-based Reinforcement algorithm, which aims to learn a Q-function that can evaluate the quality of each action or state pair.

This article is from the free online

Applications of AI Technology

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now