## Want to keep learning?

This content is taken from the Taipei Medical University's online course, Applications of AI Technology. Join the course to learn more.
2.12

## Taipei Medical University

Skip to 0 minutes and 14 secondsThe most fundamental problem of RL is the “Exploration vs. Exploitation problem”. In the beginning of a learning, we need to explore all possibilities and find the actions that lead to highest rewards. Once we have gathered enough information of our environment, we should start exploit to optimize our policy model. If we keep doing exploration, then we may lose chances to find best policy. The most common algorithm to balance between exploration and exploitation is the epsilon-greedy policy, which has ε chance to do random actions for exploration, or (1- ε) chance to performs greedy action for exploitation. Now I am going to introduce Q-learning, which is common reinforcement learning algorithm.

Skip to 1 minute and 12 secondsQ-learning is value-based RL algorithm, which aims to learn a Q-function that can evaluate the quality of each action/state pair. First, let’s define the future reward capital R_t, which is the sum of future reward from r_t to r_(t+n). Because the future reward is uncertain, we add a discount factor gamma to reduce the future rewards. The formula of capital R_t can be rewritten as r_t + R_(t+1) multiplied by gamma Our goal is to maximize the discounted future reward, which is called Bellman equation Finally, we employ the epsilon greedy and add a learning alpha to switch between exploration and exploitation. Traditionally, the Q function was calculated by using table or dynamic programming, which cannot be scaled for large and complex environments.

Skip to 2 minutes and 16 secondsDeepMind proposed to use neural network to approximate the Q-table and proposed the Deep Q Network (DQN). This approach was considered as infeasible because neural network and reinforcement learning are both unstable and hard to train.

Skip to 2 minutes and 35 secondsDeepMind shows the effectiveness of DQN by introducing two tricks: Experience replay and target network. Additionally, DeepMind uses only raw pixels as input, and generalize DQN to many different environments.

# Deep Reinforcement Learning: Part2

Continung on explaining deep reinforcement learning, Prof. Lai first talks about the fundamental problem of Reinforcement Learning, exploration versus exploitation problem.

Then, he tells the Q-learning, which is a common reinforcement learning algorithm. Q-learning is a value-based Reinforcement algorithm, which aims to learn a Q-function that can evaluate the quality of each action or state pair.