We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

Deep Reinforcement Learning: Part1

video
14.2
Deep reinforcement learning is one of the hottest research topics nowadays, thanks to DeepMind and AlphaGo. Some researchers believed that deep learning + reinforcement learning is the key to human intelligence. Let’s see, what is deep reinforcement learning. The basic idea behind deep reinforcement learning is simple. There is a software agent and an environment. The environment provides observations and rewards to the agent. The goal of the agent is learning to perform actions to achieve maximum future reward under various observations. In other words, the software agents train to learn an optimized model in an environment by playing hundreds of millions trial-and-errors. AlphaGo is the most famous success story of deep reinforcement learning.
72.5
In 2016, Alpha won the five-game Go match against the 18-time world champion Lee Sedol, and lost only one game, which is the last game won by humans. After that AlphaGo has swept the human opponents. DeepMind was not satisfied with their achievement and continue pushing the limit. In Oct. 2017, DeepMind introduced AlphaGo Zero, which learns Go without referring any human play records. Eventually, no one can understand the strategies behind the moves of AlphaGo Zero, which are beyond human comprehension. The latest work from DeepMind is AlphaStar, which learns to play the famous real-time strategy game StarCraft. Although StarCraft is easier to play than Go for most people, it is a partially observable game.
133.2
Go is a fully observable game that all opponents’ moves can be observed. Therefore, StarCraft is actually more difficult than Go. DeepMind first demonstrated that deep learning techniques can be applied to learn playing Atari games, and achieved or even surpassed human players’ levels. DeepMind choose raw pixels as input and successfully use the same neural network architecture to learn many different games. To let other researchers to do deep reinforcement learning research, OpenAI has developed the OpenAI Gym, which include environments of multiple tasks. In addition to Atari games, OpenAI also create many simulation environments for robotics. They can even simulate the robots of Boston Dynamics! Now let’s define reinforcement learning mathematically.
193
There is an software agent to learn doing actions a_t to achieve maximum reward R under various observations. The observation is called state here. The behaviour function of the agent is called Policy, which can be written as a conditional function of action a_t given the state s_t. And there is a value function that is used to evaluate the quality of each action-state pair. The environment is called model, which represents the agent’s world. Based on the objective functions,
231.6
there are three types of reinforcement learning: model-based, value-based and policy-based. For example, DQN belongs to the valued-based reinforcement learning. There are some methods combining two learning strategies, such as actor-critic. Here is the taxonomy of RL methods made by OpenAI. For more details please refer to OpenAI’s blog.
Prof. Lai will introduce what is deep reinforcement learning. Deep reinforcement learning is a category of machine learning and artificial intelligence where intelligent machines can learn from their actions similar to the way humans learn from experience. Recently, Deep reinforcement learning is one of the hottest research topics, thanks to DeepMind and AlphaGo.
He uses a metaphor to explain. Deep reinforcement learning can be put as an example of a software agent and an environment. The environment provides observations and rewards to the agent. The goal of the agent is learning to perform actions to achieve maximum future reward under various observations. In other words, the software agents train to learn an optimized model in an environment by playing hundreds of millions of trial-and-errors. The famous example is ALphaGo.
Next, Prof. Lai explains the loop concept of deep reinforcement learning. And he also introduces three types of reinforcement learning:
  • model-based
  • value-based
  • policy-based
If you are interested in learning reinforcement learning, check on the see also links, you will find more information of reinforcement learning.
This article is from the free online

Applications of AI Technology

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education