Skip main navigation

Deep Reinforcement Learning: Part2

The most fundamental problem of RL is the “Exploration vs. Exploitation problem”. In the beginning of a learning, we need to explore all possibilities and find the actions that lead to highest rewards. Once we have gathered enough information of our environment, we should start exploit to optimize our policy model. If we keep doing exploration, then we may lose chances to find best policy. The most common algorithm to balance between exploration and exploitation is the epsilon-greedy policy, which has ε chance to do random actions for exploration, or (1- ε) chance to performs greedy action for exploitation. Now I am going to introduce Q-learning, which is common reinforcement learning algorithm.
Q-learning is value-based RL algorithm, which aims to learn a Q-function that can evaluate the quality of each action/state pair. First, let’s define the future reward capital R_t, which is the sum of future reward from r_t to r_(t+n). Because the future reward is uncertain, we add a discount factor gamma to reduce the future rewards. The formula of capital R_t can be rewritten as r_t + R_(t+1) multiplied by gamma Our goal is to maximize the discounted future reward, which is called Bellman equation Finally, we employ the epsilon greedy and add a learning alpha to switch between exploration and exploitation. Traditionally, the Q function was calculated by using table or dynamic programming, which cannot be scaled for large and complex environments.
DeepMind proposed to use neural network to approximate the Q-table and proposed the Deep Q Network (DQN). This approach was considered as infeasible because neural network and reinforcement learning are both unstable and hard to train.
DeepMind shows the effectiveness of DQN by introducing two tricks: Experience replay and target network. Additionally, DeepMind uses only raw pixels as input, and generalize DQN to many different environments.

Continung on explaining deep reinforcement learning, Prof. Lai first talks about the fundamental problem of Reinforcement Learning, exploration versus exploitation problem.

Then, he tells the Q-learning, which is a common reinforcement learning algorithm. Q-learning is a value-based Reinforcement algorithm, which aims to learn a Q-function that can evaluate the quality of each action or state pair.

This article is from the free online

Applications of AI Technology

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education