Right. We’re now going to consider how we can get the robot to learn a task– the task of moving around and avoiding obstacles. And we will mention rules of behaviour. So, we’re going to use the robot that we had in Week 2, which has got two sensors. This one over here, which detects the distance of any object around over here, i.e. to the forward and to the left. This one over here detects objects over on the right. Got two wheels which we can set the speeds of the associated motor. We’re going to assume that we can make each wheel either go forward or go backwards.
So if the robot is to move around avoiding obstacles, but there is no obstacle, then we might as well set both wheels to go forward. So the robot will move from here to here to here. If, however, there is an obstacle on the left, then we want the robot to avoid it. And one way we could do that is have left motor forward, right one back. So the robot would turn right away and therefore avoiding the obstacles. And we can formalise this behaviour like the following rules. If the left sensor and the right sensor are giving a large value that means there’s no object near at all so the left motor is set forward, the right motor is set forward.
But if the left sensor is large, that means there is an obstacle near the right sensor. Then we can set the left motor backwards, the right motor forward, and that will turn to the left. Similarly, otherwise, if the right sensor is large, so there’s an object near the left sensor, then we can say the left motor should go forward, the right motor back to that will turn it to the right. Otherwise, we retreat, both motors go backwards. And this is instinctive behaviour. The question is can it be learned? So, need to talk about instincts and learning. Most advanced animals have some sort of basic instinctive behaviour built in, like how to feed. Initially, however, they can’t control themselves.
But they have built in the ability to learn. Learning can be done, for instance, by trial and error. You try something, then you say, how good was that? And you change your behaviour if it’s not very good. Like babies when they’re very young, they want to grab hold of objects, typically to put them in their mouth. But they don’t know how to grab things so they randomly move arms and legs before they learn that this is how you can move the arms and get them closer to the object. Basically, it’s a feedback process and we can adapt that concept for our robot. So the basic idea is we have four sets of possible actions.
Each motor can go forward or back. So both forward, left forward, right back, back, left back, right forward, or both going back. And associated with each action is the probability of choosing it. And initially all actions are equally probable. And in use, the robot decides on one action based on those probabilities, and then does it. And the action with the highest probability is the one that is most likely to be chosen. The robot then decides if the action they chose, was it a success? If yes, you then increase the probability of choosing that action, decreasing the others. If it was not a success, then you decrease the probability of choosing it and increase the others.
So, therefore, successful actions become more probable, and therefore they’re more likely to be chosen. As I say, it’s trial and error meets learning by the cybernetic processes of feedback. The problem with this is that if there’s nothing around, the best action is different from if there’s something on the left or on the right. So instead of having one set of actions with those four possible actions and the associated probabilities, we have a number of them, each with the actions and their associated probabilities. So one is when there’s nothing visible, one where there’s something on the left, one whether something on the right and, optionally, if there’s something that both sensors can see.
So the robot determines instinctively which scenario it’s in and then it goes through the trial and error process of choosing an appropriate action based on the probabilities, doing it, evaluating it, and adjusting the probabilities.
What is it we mean by is it a successful action? Well. The instinctive behaviour built in is that if nothing’s visible, it’s better to go forward. Otherwise, it’s successful for moving away from the objects. Let’s see whether this actually works using another web page. So here is the web page. We see the robot as usual, and we’ve got a set of probabilities– forward, forward, forward, back, back, forward, back, back. For when there’s no wall, when there’s one on the left, there’s one on the right– I’m not going to use the both state– and so forth. And initially all the probability is 25%. So it’s equally possible. And we press start.
And we’re in the open, so the wall is not seen. So very rapidly, you’ll notice, that the probabilities have meant that the robot moves forward most of the time. Because it occasionally chooses the action on the basis of the one that’s most likely, it can still choose some of the others, which is why it does some odd movements. But you also notice that he has detected the situation when there’s a wall on the left, and it’s adjusted the probabilities. And it’s also adjusted the situation when there’s an object on the right or a wall on the right. He’s now moving around quite well. Is it able to avoid things? Yes, it’s turned away quite nicely from the wall there.
So it has learned that when there’s a wall on the left to move away from, turn to the right. So in this simple example we’ve seen that it’s worked quite well. And you can also on the web page make it a slightly more complicated environment. But it’s worked. While it’s still working, well, stop. That’s the basic idea and it does work. We can enhance that using so-called shared experience learning. We have multiple robots, each using the same process to learn. The key point is that they share their experiences. They tell the others that, for instance, in one particular situation, I did this and the result was a good or it was bad.
And then the robots can adjust the probabilities on the basis of what they’re told. So they don’t have to experience a wall on the left, one of their colleagues does that and learns how to avoid it. That information comes back to the other robot. We found when we did that, that if we had two robots learning and communicating, overall they learn almost twice as fast, with almost four times the speed. It wasn’t exactly four times because when you’re communicating between the robots, sometimes the information is lost. For instance, if there are a number of robots in the way, they can’t see each other to communicate. And this worked very successful here in Reading in the UK.
But we also had fun because we put one of our robots over in the States, and we communicated over the internet. So our robot in Reading was teaching the one in America. And that was nice to see.
So in summary, the basic instinctive behaviour for avoiding can be programmed into a robot, but it can also be learned. And we’ve shown here how we can use trial and error for that process. And that can be in enhanced by having cooperation between different robots sharing experiences.
We showed that to avoid an object on the left, you could have left, forward, right, back but you could also do it by left, forward, right, stop. And it will turn a little less quickly. So given that, some of the initial work that we did– and we’ll show that in a later video– is rather than having four actions, we have nine actions because each motor can go forward, be off, or stopped, or go backwards. And the subsequent work, which I’m going to show you in another video, of how we tried to get the robot to learn what the different scenarios are. But what we have shown you is that we can get a robot to learn a simple task.