Welcome. My name’s David Millard. I’m a Senior Lecturer of Computer and Web Science at the University of Southampton. And this week, we’re going to talk about networks. Now networks are a fundamental part of the Web. And they’re a key tool for Web Scientists. So consider the networks around you. They might be transport networks, stations connected by rail lines or towns connected by roads. Perhaps power networks, the national grid, power stations, substations, and power lines. Or even social networks, your personal and professional network of relationships, friends and colleagues. And is there a single way to represent all these diverse networks?
Because if we had a common representation, it would allow us to compare networks and to see whether they behaved in the same way. Now the simplest way to represent the network is to think of it as a set of nodes. Now nodes represent things, so train stations, power stations, people. And those nodes are connected by edges. And edges simply represent associations or connections. So they are the rail lines between the stations, the relationships between the people, and so on. And now we can ask some basic questions about these very diverse networks. What do they have in common? Are they the same shape? And do they change and grow in the same way?
And by answering those questions, we can learn more about the real world situations that those networks describe. Now scientists and mathematicians used to think mainly about two different types of networks. Regular networks have a very uniform structure. So in our example here, all the nodes have two, three, or four links. And then are known only to their immediate neighbours. And random networks have a very chaotic structure. So nodes have a very different number of connections and each node has links to a wide variety of other nodes right across the network. And the assumption that people use to make was that networks in real world looked a lot like the random network. But think about that. Is that really true?
Does your social network look like that kind of random chaotic set of connections? Now in the 1960s, Stanley Milgram, a social psychologist at Yale, he set out to investigate real world social networks. And he did this by sending nearly 300 letters from Omaha in Nebraska to Boston in Massachusetts. And all of these letters had to be delivered by hand. And crucially, they could only be passed from one friend to another. Now Milgram was interested in whether the letters would arrive. But perhaps more importantly, how many hops, meaning passes for one person to another, would it take for them to get to their destination? So if the network was regular, we might expect many small hops.
And if the network was random, we might expect a wide variety of hops with some letters bouncing all over the country before they reached their destination. But in the end what Milgram discovered was that 64 out of his letters arrived. And most of those who dropped out were people who simply failed to pass those letters on. But of the letters that got there, most were close to an average of six hops. Now Milgram’s is a very imaginative experiment, but it’s not completely rigorous. But it does imply that the structure of a network is not what had previously been thought.
And in fact this idea that you can cross a network from one side to the other in a very small number of hops, typically six degrees of separation, is called a small world property. Now in 1997, inspired by Milgram’s work, Duncan Watts was undertaking his PhD At Cornell University. And he looked in detail at a number of well defined networks to find out whether they shared this small world property. He looked at the neural network of worms, he looked at the US power network, and he looked at the network of Hollywood actors where he said that two Hollywood actors were connected if they starred in the same film. Now those are really diverse networks.
But Watts showed that they all have the same small world property, this same six degrees of separation. But why? What is it that gives these networks this small world property? And why is it that these real world networks all have this common property together? A few years after Watts had done his work, Laszlo Barabasi, who was then at the University of Notre Dame, he realised that the Web offered a real opportunity to explore this question. And that’s because the pages and the links on the Web are able to be analysed by a machine. So he created scripts that would crawl a subset of Web pages and map out the structure.
And he would be able to find whether the Web had this same property and potentially explore why that was the case. Now what Barabasi was expecting to see was a normal distribution of links. So a normal distribution is what you get if, for example, you test students in a class. Some will do very well. Some will do very badly. But most will be in the middle, sort of underneath the major curve. So in this case, what Barabasi was expecting was that some pages would have very many links and some would have very few, but most would have near the average number. But he saw a picture that was really different.
What he found was a power law distribution of links where the great majority of pages have very, very few links. But some, he called these hubs, have many. And Barabasi described this structure as a scale free network because it maintains this property, whatever scale you look at it. So a scale free network is one that is defined as saying the connections follow this power law. And it’s dominated by a small number of powerful hubs. And crucially for our understanding of real world networks, because hubs have connections that range wildly across the network, it means that scale free networks show this small world property.
So we are kind of able to say that real world networks are small worlds because they are scale free. Of course, that doesn’t really answer our question. Because the question just changes. What makes them scale free in the first place? So luckily the answer is actually quite simple. And it’s a principal known as preferential attachment. Perhaps summed up by the phrase the rich get richer. So in many real world networks, the more edges a node already has, the more connections, then the more likely it is to attract new edges. So as an example, if we think about our Hollywood film actors, the more films an actor has appeared in, the more likely they are to appear in new films.
And this simple property, this simple idea of preferential attachment, this explains this incredible structure and this amazing small world property. So how can you tell if a network is scale free? What can you analyse? What can you count? How do you get some numbers that enable you to actually characterise a network? So in the next few steps, we’re going to look at a number of useful properties such as the average degree of nodes and things like a clustering coefficient. And then we’re going to see how they can be applied to find different types of networks or to identify important nodes. Now these properties can allow you to compare networks.
So for example, researchers at the University of Coventry have analysed the social networks of characters in several texts ranging from historical texts like Beowulf to more modern novels like Harry Potter. And they discovered that the historical texts have social networks that are more similar to real life social networks than the fictional texts. So we’re able to see that they are very different in character. And excitingly, they even suggested that mythic texts where we’re not quite sure whether they have some kind of historical basis can sometimes be shown to lie somewhere in between. So the network analysis gives us important evidence that there might be a historical basis behind the stories. So network analysis can be very powerful.
But we always have to be careful. Because networks are just an abstraction. And they don’t necessarily contain all the info you might need to understand the real world situation. So for example, if you consider a map of the London Tube, the map will tell you how the Tube stations are connected. But it won’t tell you how far apart they are. And that means that, for example, someone travelling from Paddington to Bond Street might take the route through Notting Hill Gate when actually it would have just been quicker to pop up to the ground level and walk across town. So we started off by saying that networks are all around you.
And we gave some examples of some of the more obvious networks, transport, power, social networks. So in fact networks are just a way of looking at the world. Any way you can define relationships, you can see a network. They are a very powerful tool for Web Science and modelling and understanding how things are connected. But remember that principles like small worlds and preferential attachment occur for different reasons in different networks. So as Web Scientists, we must always take notice of the context and never forget that networks are just an abstraction even if they are very useful one.