3.4

The University of Sheffield

A human like robot learning to play a game

Game theory

Advances in control and automation have made it possible for robot teams to work together in order to complete a task. When robots work together in such as way, the action of each robot in the team influences the actions of the other robots. Therefore, if the robots need to work independently, a coordination mechanism among the robots is needed.

Game theory provides such a mechanism. In game theory, each robot is considered to be a player of a game and receives rewards dependent on the actions of the whole robotic team. If all robots work in a coordinated way to accomplish the task, each robot will receive a positive reward. Therefore, the goal of the game is for the team to find a coordinated solution that will maximise the rewards for each robot and the total reward of the whole team.

Let’s look at Michalis’ example again. Consider two Unmanned Aerial Vehicles (UAVs) flying towards each other from opposite directions. They can fly either at high or low altitude. The goal of the two UAVs is to fly at different altitudes in order to avoid collision.

The interaction between the two UAVs can be described by the game depicted in the table below. In this game, one UAV is modelled as a ‘row player’ and the other is a ‘column player’. If both UAVs fail to coordinate by choosing to fly at the same altitude, they will not receive any reward (i.e. 0). Each UAV receives a positive reward (i.e. 1) if they avoid collision by flying at different altitudes. In the game, the rewards of the robot team are represented by a matrix, as shown in the table below.

Column Player
Fly at high altitude Fly at low altitude
Row Player Fly at high altitude 0,0 1,1
Fly at low altitude 1,1 0,0

Rewards of a simple symmetric coordination game

However, if a game involves more than 2 players, then it is very complex to store the rewards as matrices. In this case, we can use a function to compute the reward whenever it is needed. For example, Table 1 can be represented by the following function R(a, b), where a is the row player’s action and b is the column player’s action.

R(high altitude, high altitude) = (0, 0), R(high altitude, low altitude) = (1, 1), R(low altitude, high altitude) = (1, 1), R(low altitude, low altitude) = (0, 0),

A second application of coordination among robots is task allocation. Consider the case of an area where dangerous or harmful materials are disposed, and robots monitor the condition of these materials. Depending on the nature of these materials, different sensors and robot specifications are needed for the team to complete the task. The same game-theoretic principle as in the simple coordination game above can be applied in this scenario with the reward function defined to take into account the actions of the robots and the properties of the areas which need to be monitored.

Another interesting example is the description of autonomous cars’ behaviour. A human driver can be either polite, i.e. respect distances, priorities etc., or aggressive, i.e. leave minimum distance, do not respect priorities etc. The autonomous cars can be seen as players of a game where they are rewarded if they cooperate by being polite or lose some reward if they choose to be aggressive drivers.

The question that arises now is how do the robots coordinate and how do they make a decision on choosing their action to maximise their - and their team’s - reward?

Game-theoretic learning algorithms can be used as a coordination mechanism among the robots. These are iterative processes where the same game is repeatedly played until either coordination is achieved or the maximum number of iterations is reached. In each iteration, each robot computes a strategy on how to choose an action and selects the best action according to the strategy. If the coordination is established via the joint action of the team, the game terminates. If not, a new iteration starts and each robot adjusts their strategy on selection of their action. The basic principle behind these algorithms is that robots use the history of observed actions in order to predict the other robots’ strategy and then choose an action based on their prediction. The general procedure of game-theoretic learning algorithms can be represented by the following figure.

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: