Learn more about this course.

Game theory

Hong Yang Qu explains game theoretic learning in more depth

A human like robot learning to play a game

Robots of the future will need to work in teams to accomplish tasks. However, effective teamwork will require coordination among the robots – game theory is one way to provide coordination. This step introduces the basics of game theory.

Game theory for robot teams

Advances in control and automation have made it possible for robot teams to work together in order to complete a task. When robots work together in such as way, the action of each robot in the team influences the actions of the other robots. Therefore, if the robots need to work independently, a coordination mechanism among the robots is needed. Game theory provides such a mechanism.

Game theory: In game theory, each robot is considered to be a player of a game and receives rewards dependent on the actions of the whole robotic team.

Want to keep
learning?

This content is taken from
The University of Sheffield online course,

Building a Future with Robots

View Course

Reward: A reward is a stimulus used to indicate a desired outcome has been achieved. A reward for humans is context-dependent, e.g. a gold medal for winning a race, but for robots is usually arbitrary, e.g. a 0 for no reward, or a 1 for a reward.

Using game theory, if all robots work in a coordinated way to accomplish a task, each robot will receive a positive reward. Therefore, the goal of the game is for the team to find a coordinated solution that will maximise the rewards for each robot and the total reward of the whole team.

A simple game for two unmanned air vehicles

Consider two Unmanned Aerial Vehicles (UAVs) flying towards each other from opposite directions. They can fly either at high or low altitude. The goal of the two UAVs is to fly at different altitudes in order to avoid collision.

The interaction between the two UAVs can be described by the game depicted in the table below. In this game:

One UAV is modelled as a ‘row player’ and the other is a ‘column player’.
If both UAVs fail to coordinate by choosing to fly at the same altitude, they will not receive any reward (i.e. 0).
Each UAV receives a positive reward (i.e. 1) if they avoid collision by flying at different altitudes.

In the game, the rewards of the robot team are represented by a matrix, as shown in the table below.

		Column Player
		Fly at high altitude	Fly at low altitude
Row Player	Fly at high altitude	0,0	1,1
	Fly at low altitude	1,1	0,0

In game theory, a solution in the table with reward 1,1 is known as a Nash equilibrium.

Nash equilibrium. A Nash equilibrium is a solution to a non-cooperative game where each player, knowing the playing strategies of their opponents, have no incentive to change their own strategy.

Note that once in a Nash equilibrium a player will be worse off by changing their strategy.

Example: In the UAV example above, once the UAVs are in a Nash equilibrium with reward 1,1 (representing one UAV flying high, one UAV flying low, and both avoiding a collision), if one UAV changes altitude, this will result in a collision – i.e. the UAV would be worse off by changing strategy.

Learning algorithms in game theory

Game-theoretic learning algorithms can be used as a coordination mechanism among the robots. These are iterative processes where the same game is repeatedly played until either coordination is achieved or the maximum number of iterations is reached.

The learning algorithm follows an iterative procedure – at each iteration, each robot:

Computes a strategy on how to choose an action
Selects the best action according to the strategy
Checks if coordination is successfully established amongst the joint action of the team

a) If no, a new iteration starts at step 1 and each robot adjusts and updates their strategy

b) If yes, the learning algorithm terminates.

The general procedure of game-theoretic learning algorithms can be represented by the following figure.

The basic principle behind these algorithms is that robots use the history of observed actions in order to predict the other robots’ strategy and then choose an action based on their prediction. The key result is that robots successfully learn to play the game.

Want to keep learning?

This content is taken from The University of Sheffield online course

Building a Future with Robots

View Course

See other articles from this course

This article is from the free online

Building a Future with Robots

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Game theory

Game theory for robot teams

Want to keep
learning?

Building a Future with Robots

A simple game for two unmanned air vehicles

Learning algorithms in game theory

Want to keep learning?

Building a Future with Robots

Building a Future with Robots

Building a Future with Robots

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Game theory

Share this step

Game theory for robot teams

Want to keep learning?

Building a Future with Robots

A simple game for two unmanned air vehicles

Learning algorithms in game theory

Want to keep learning?

Building a Future with Robots

Share this

Building a Future with Robots

Building a Future with Robots

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?