Want to keep learning?

This content is taken from the Taipei Medical University's online course, Applications of AI Technology. Join the course to learn more.

Skip to 0 minutes and 14 seconds Deep reinforcement learning is one of the hottest research topics nowadays, thanks to DeepMind and AlphaGo. Some researchers believed that deep learning + reinforcement learning is the key to human intelligence. Let’s see, what is deep reinforcement learning. The basic idea behind deep reinforcement learning is simple. There is a software agent and an environment. The environment provides observations and rewards to the agent. The goal of the agent is learning to perform actions to achieve maximum future reward under various observations. In other words, the software agents train to learn an optimized model in an environment by playing hundreds of millions trial-and-errors. AlphaGo is the most famous success story of deep reinforcement learning.

Skip to 1 minute and 13 seconds In 2016, Alpha won the five-game Go match against the 18-time world champion Lee Sedol, and lost only one game, which is the last game won by humans. After that AlphaGo has swept the human opponents. DeepMind was not satisfied with their achievement and continue pushing the limit. In Oct. 2017, DeepMind introduced AlphaGo Zero, which learns Go without referring any human play records. Eventually, no one can understand the strategies behind the moves of AlphaGo Zero, which are beyond human comprehension. The latest work from DeepMind is AlphaStar, which learns to play the famous real-time strategy game StarCraft. Although StarCraft is easier to play than Go for most people, it is a partially observable game.

Skip to 2 minutes and 13 seconds Go is a fully observable game that all opponents’ moves can be observed. Therefore, StarCraft is actually more difficult than Go. DeepMind first demonstrated that deep learning techniques can be applied to learn playing Atari games, and achieved or even surpassed human players’ levels. DeepMind choose raw pixels as input and successfully use the same neural network architecture to learn many different games. To let other researchers to do deep reinforcement learning research, OpenAI has developed the OpenAI Gym, which include environments of multiple tasks. In addition to Atari games, OpenAI also create many simulation environments for robotics. They can even simulate the robots of Boston Dynamics! Now let’s define reinforcement learning mathematically.

Skip to 3 minutes and 13 seconds There is an software agent to learn doing actions a_t to achieve maximum reward R under various observations. The observation is called state here. The behaviour function of the agent is called Policy, which can be written as a conditional function of action a_t given the state s_t. And there is a value function that is used to evaluate the quality of each action-state pair. The environment is called model, which represents the agent’s world. Based on the objective functions,

Skip to 3 minutes and 52 seconds there are three types of reinforcement learning: model-based, value-based and policy-based. For example, DQN belongs to the valued-based reinforcement learning. There are some methods combining two learning strategies, such as actor-critic. Here is the taxonomy of RL methods made by OpenAI. For more details please refer to OpenAI’s blog.

Deep Reinforcement Learning: Part1

Prof. Lai will introduce what is deep reinforcement learning. Deep reinforcement learning is a category of machine learning and artificial intelligence where intelligent machines can learn from their actions similar to the way humans learn from experience. Recently, Deep reinforcement learning is one of the hottest research topics, thanks to DeepMind and AlphaGo.

He uses a metaphor to explain. Deep reinforcement learning can be put as an example of a software agent and an environment. The environment provides observations and rewards to the agent. The goal of the agent is learning to perform actions to achieve maximum future reward under various observations. In other words, the software agents train to learn an optimized model in an environment by playing hundreds of millions of trial-and-errors. The famous example is ALphaGo.

Next, Prof. Lai explains the loop concept of deep reinforcement learning. And he also introduces three types of reinforcement learning:

  • model-based
  • value-based
  • policy-based

If you are interested in learning reinforcement learning, check on the see also links, you will find more information of reinforcement learning.

Share this video:

This video is from the free online course:

Applications of AI Technology

Taipei Medical University