Want to keep learning?

This content is taken from the The University of Glasgow's online course, Getting Started with Teaching Data Science in Schools. Join the course to learn more.

Skip to 0 minutes and 1 second LOVISA: When we represent a quantity visually– so either through length, through area, through radius or through colour– then humans tend to process that more quickly and more accurately, which is why visualisation is an essential part of the analytical process. But choosing which visualisation is not always an easy task. We have very many options at our disposal. So let’s have a little whirlwind tour on the most common ones. For example, say that you have a continuous variable. Then it is usually the case that we want to have a sense of the shape of the distribution– so for example, whether it’s symmetrical or whether it’s skewed. And then you have probably seen a histogram.

Skip to 0 minutes and 44 seconds It looks like this, and it’s one of the most common ones. We have the frequency of each value on the vertical axis, and we have the value on the horizontal axis. But as common as this is, it’s not our only choice. If we want to emphasise the number of different instances, then we could opt for a dot plot instead where we have each value as a dot, simply, or if we want to emphasise the central tendency of the distribution– so, for example, the median– then it is conventional to go for a box plot. So the box plot has the median as this line in the middle and the quantiles as the edge values.

Skip to 1 minute and 30 seconds But the box plot doesn’t reveal the shape of the distribution. So if we want to mix it up with a histogram, then we have something called a violin plot, which has the middle value just like the box plot, but also the shape of the distribution. Now, suppose that we have two continuous variables. Then it is usually the case that we want to have a sense of how they relate to each other, whether one depends on the other. So the most common thing is to have a scatter diagram, or scatterplot, where we simply have the coordinates as being the x and y-value. You have probably seen that before. But suppose that we have a categorical variable.

Skip to 2 minutes and 15 seconds Then we have something that looks deceptively much like a histogram but actually isn’t. This is a bar chart where we have the value, the category on the horizontal axis, and, again, the value on the vertical one. These can be shuffled around because there’s no order to the x-axis. However, it is very common that they are sorted from small to big. It is not our only choice. We can also go for a lollipop plot, which tends to be less visually cluttered, particularly when we have many different values. Finally, suppose that we have a series of values that are ordered– so, for example, one value per week, one value per year. Then you have probably seen a line graph.

Skip to 3 minutes and 11 seconds This is one of the most popular ones, perhaps. But, again, a line graph is not our only choice. We can also go for an area graph that gives us a solid area showing the trend line over time. And let me tell you, these are not the only ones. There are thousands and thousands of different visualisations out there.

Variety of visualizations

How do you know which kind of visualization to use when displaying your data? In this video, Lovisa describes a range of techniques, including:

  • histogram
  • box plot
  • violin plot
  • scatter plot
  • bar chart
  • line graph

Share this video:

This video is from the free online course:

Getting Started with Teaching Data Science in Schools

The University of Glasgow