## Want to keep learning?

This content is taken from the The University of Glasgow's online course, Getting Started with Teaching Data Science in Schools. Join the course to learn more.
2.11

## The University of Glasgow

Skip to 0 minutes and 6 seconds JEREMY: Hi, Lovisa. I thought we could talk about some of our favourite data visualisations today. This is a word cloud here. And this is a representation of a piece of text, a corpus of text. And the size of each word indicates how frequently that word appears in the text. So this is actually a story. I quite like this story. I want you to guess what the story is by looking at the text. What do you think? Have a look at the words and see if you know what the story is. So what are the big words?

Skip to 0 minutes and 37 seconds LOVISA: I do notice that there is one big word of “Alice.”

Skip to 0 minutes and 40 seconds JEREMY: “Alice.” Yes. That’s good.

Skip to 0 minutes and 42 seconds LOVISA: Could it be Alice in Wonderland?

Skip to 0 minutes and 44 seconds JEREMY: Yeah. That’s right. It’s Alice in Wonderland. Look. Can you see there’s the Hatter. And, oh, dear, the turtle. There’s the queen. She’s the baddie. There’s the king. He gets a bit confused. But you’re right. It’s Alice in Wonderland. So I generated these from Lewis Carroll’s text from an online copy on Project Gutenberg to generate this word cloud visualisation. So I really like this. It’s a very quick overview of the story, and the main characters, and the main concepts in the story. You’ve got a visualisation here which you sent me. Can you explain this one to me, please?

Skip to 1 minute and 17 seconds LOVISA: I do. Well this one represents the poverty rates in Edinburgh.

Skip to 1 minute and 25 seconds JEREMY: Oh, yes. I can see the shape of the city there. Yes, that’s right.

Skip to 1 minute and 29 seconds LOVISA: And it is almost like a heat map except, instead of using colour to indicate magnitude, so how high the poverty rate is, we use the height.

Skip to 1 minute and 41 seconds JEREMY: Yes. Good. I can see that. So in the middle of the city, the areas are quite low. But then on the outside, they’re much higher. What are you actually measuring here? Sorry.

Skip to 1 minute and 55 seconds LOVISA: Poverty rates.

Skip to 1 minute and 56 seconds JEREMY: Poverty rates. OK. And then towards the edge of the city, in the suburbs, this higher– OK. So higher levels of poverty. That’s really interesting.

Skip to 2 minutes and 3 seconds LOVISA: Yeah. Along the coast they begin to spike.

Skip to 2 minutes and 5 seconds JEREMY: OK. Yes. Yes. Yes. What’s the big bar here?

Skip to 2 minutes and 10 seconds LOVISA: The big bar is almost like an axis. It’s there for scale. So it represents 100%. And anything that is a third of it could be 33% poverty rate.

Skip to 2 minutes and 21 seconds JEREMY: OK. Yeah. Yeah. That’s a really striking visualisation, isn’t it? Yeah. And it’s nice that you get the kind of geographical correspondence there. You kind of see the city. Very good. What do you call this kind of visualisation?

Skip to 2 minutes and 34 seconds LOVISA: Well, I have checked the pronunciation of this. And apparently it’s a “choropleth.”

Skip to 2 minutes and 38 seconds JEREMY: A choropleth.

Skip to 2 minutes and 39 seconds LOVISA: I do believe it’s Greek.

Skip to 2 minutes and 41 seconds JEREMY: Ooh. Very nice. And there’s one more visualisation to look at. I’ll just get it up on the screen here. Wow. So I really like this one. This visualisation here is very historical. I mean, the first one, that was literary I suppose. Alice in Wonderland. And the second one is geographical within the Edinburgh city. This is very much a historical narrative here. So what we’ve got is Napoleon’s French armies starting off over here. It was a very big army there. And as they travelled towards Moscow in 1812, the army gets smaller and smaller and smaller until eventually– how many people got in Moscow? There were 100,000 soldiers, I think.

Skip to 3 minutes and 17 seconds And then they retreat and come back from Moscow in the snow. And the line gets thinner and thinner because there are fewer and fewer soldiers until eventually there were only 10,000.

Skip to 3 minutes and 25 seconds LOVISA: Wow. It’s so clean and stylish. What programming library was used to generate this?

Skip to 3 minutes and 32 seconds JEREMY: Well, you know what? This visualisation was drawn by hand. A French information visualisation expert from the 1800s called Charles Minard generated this drawing himself and didn’t use any software at all. You can use programs these days to draw this kind of dia– it’s called a Sankey diagram nowadays. And there are JavaScript and Python libraries to draw Sankey diagrams. But this was entirely drawn by hand.

Skip to 4 minutes and 5 seconds LOVISA: Wow. That’s incredible.

Skip to 4 minutes and 7 seconds JEREMY: There’s more data on it as well, actually. So you can see as they’re retreating, the army’s retreating here. You’ve got the temperature that was measured. And you see it starts off at zero degrees, freezing point there in Moscow. And it goes down as low as minus 30 here. So they were certainly in extreme conditions as they were retreating through Russia. OK. Great. So quick question. Now that we’ve looked at visualisations, what do you think really is the essence of a good data visualisation?

Skip to 4 minutes and 39 seconds LOVISA: Well, I think as much as we like things that are clean and clear, what also matters is that it’s visually appealing. So it’s pretty.

Skip to 4 minutes and 48 seconds JEREMY: It looks beautiful. Yeah. I think that’s true of my Alice in Wonderland story there. I think that looks really nice. And it’s bright and colourful. I don’t think the colour actually means anything. It’s just the size that means something in terms of the word frequency. But, yes. It does look very visually appealing. Good. What else makes an effective data visualisation?

Skip to 5 minutes and 9 seconds LOVISA: Choosing how to represent your quantities very carefully. So what dimensions are you going to use to represent magnitude, poverty rate. You use height, you use colour. Think carefully about it.

Skip to 5 minutes and 23 seconds JEREMY: OK, great. Anything else? If you were going to choose an appropriate data visualisation technique?

Skip to 5 minutes and 29 seconds LOVISA: I think ideally a visualisation is almost like a narrative in itself. It’s a full story that you can read. Self-contained.

Skip to 5 minutes and 38 seconds JEREMY: And that’s definitely true for this visualisation of the progress of the French armies in Russia here, isn’t it? You can see the story just from the graphic. Yeah. Great.

# These are a few of my favourite visualizations

A wide variety of different data visualizations is possible. In this video we explore three more exotic graphical data presentations.

Jeremy shows a word cloud, which is a graphical summary of a corpus of text. The size of a word indicates its relative frequency in the text - the most commonly occuring word should be the largest. Often stop words like ‘the’ or ‘and’ are removed before a word cloud is generated.

Lovisa describes a cloropleth, which indicates quantitative data for different geographical areas - in this case different districts of Edinburgh City region. This is a nicely intuitive visualization, but requires some quite intricate design.

Finally, Jeremy presents a Sankey diagram which indicates the size of population subsets over time as the population splinters into smaller groups. The original Sankey diagram charted Minard’s visualization of Napolean’s military campaign in Russia.

There are online tools or Python libraries available to generate custom diagrams in these formats. Check out sankeymatic.com or wordclouds.com, for instance.

What is your favourite visualization technique? Please share your ideas in the discussion section.