Want to keep learning?

This content is taken from the University of California, Berkeley, Center for Effective Global Action (CEGA) & Berkeley Initiative for Transparency in the Social Sciences (BITSS)'s online course, Transparent and Open Social Science Research. Join the course to learn more.

Skip to 0 minutes and 0 seconds Let’s get to the specifics. What should we do? And here there is a number of very specific items that maybe we can take something away from. I do like the book in one sense in that Tufte at couple different points says, “Let’s not be dogmatic.” I’m gonna kind of talk about some principles and things you should do. But, if it looks really bad in your case or doesn’t make sense, don’t do it. It’s just that, here are some principles to keep in mind. Data is most effectively presented when you get rid of extraneous stuff, extra stuff. So you know a summary way of thinking about things is, we want to minimize chart junk. Just like crap that’s in the figure.

Skip to 0 minutes and 37 seconds So what are some examples? Gridlines are an example. A lot of people’s figures have tons of gridlines. So here he illustrates it with Playfair. So remember Playfair, this famous British economist who kind of innovated in the space of digital visualization. 1785, the year before his great commercial atlas that we talked about before. This is how he was presenting the trade balance and you can see it’s pretty good. You see the lines, it’s pretty clear what’s going on. But what you’ll see immediately, there are tons of these really dark grid lines all over the figure and they’re pretty prominent. This line is thicker than the line we care about, actually here.

Skip to 1 minute and 19 seconds So, the data in some ways is obscured by the grid. Now, there’s probably a good reason that he did this. People are drawing these things by hand at this point and he wants to have accurate data. So he has a grid and he’s like tracing out the points and then connecting them, so there may have been some rationale for this. But, even by 1786 the next year, he’s presenting something very similar. Again a trade balance figure, and it’s so much better a figure. Just aesthetically. Okay, there are still some gridlines but there are fewer gridlines. They’re much lighter, they’re much thinner and the things that are prominent here are the data. The data is what’s prominent in the figure.

Skip to 2 minutes and 0 seconds So you know, the same person over time, he kind of learned and iterated and revised to come up with a much for effective, clear presentation of data. And this is something we can all learn from. Like if you need gridlines, one of the things that Tufte says is, “Make them really light and grey and don’t let them swamp the data.” The data has to be front and center in any of our presentations of data.

Skip to 2 minutes and 24 seconds That’s one thing. Another thing that you see and you see this a lot in people’s figures is, people use excessive decoration and color and all these really annoying patterns. In Excel or in a lot of statistical programs there are these, sort of like, defaults. So if I have 5 categories of a variable or something, the default is, one is checkered and one is stripes and one is crosshatch and whatever. And people just go with the default, because it’s kind of useful. But a lot of those patterns are just terrible. They’re visually terrible and they really distract from the data. So he gives an example from a long time ago.

Skip to 3 minutes and 0 seconds This is from the 1920’s, but it was just the ultimate example. You can’t blame computer software for this, this is 1927.

Skip to 3 minutes and 11 seconds Do you really need these crosshatches here? What is this, like a Formula 1 race? Like, what do I need this starting flag here? All these colors, there are labels and colors and crosshatching. And then, of course, there’s the pie chart, which you can’t visually compare these quantities to each other but it kind of looks pretty and it’s circular. So there is all this, potentially a lot of data here, but it’s very hard to absorb. Another thing to avoid, avoid computer abbreviations or other unintelligible jargon in figures. Sometimes people are too lazy and they just leave the variable name from their raw data set in a figure. Or, they use some other abbreviation that means something to them.

Skip to 3 minutes and 53 seconds It’s like shockingly common! Make your figures intelligible to people, don’t do that. That’s chart junk, according to Tufte. The other thing that chart junk does is it fills in space when there is not much data. So, it’s sort of like, when there isn’t that much data the space is filled in with crap. And in this case it’s better to just have a simple table. If you only have a limited number of things – observations, data points. Better just to put it in a simple table. People can digest 5 or 10 data points. So this is an example, and again, this is some of Tufte’s words on top.

Skip to 4 minutes and 24 seconds He says, “A series of weird 3-dimensional displays appeared in the magazine American Education in the 70’s, delighting connoisseurs of the graphically preposterous.” Here 5 colors report, almost by happenstance, only 5 pieces of data since the top and the bottom are just mirror images of each other. One minus the other. So there’s absolutely no data. You could cut this in half and not lose any data. He says, “This may well be the worse graphic ever to find its way into print.” So again, I had to put what he said was the best one and the worst one. So here you could have a simple table, like in 1973 it was this and now it’s that.

Skip to 5 minutes and 0 seconds You could probably summarize this pretty well with 2 numbers. Like, that number in 1973 and that number in 1976. So this is truly terrible, don’t do this.

Skip to 5 minutes and 14 seconds Reduce the use of colors when possible. 5-10 of the population is either color blind or color deficient and you don’t really need the color differences most of the time. You can use line thickness, you use greyscale, you could use other things. The other thing is, with some exceptions, and Tufte talks about this, but most colors lack a natural hierarchy. Some do, there are these kind of heat map type plots from blue to green to yellow to red. Where your eye’s immediately drawn to some kind of hierarchy. You know, these pollution maps you kind of know the red areas aren’t that good. There’s a lot of pollution there, and your eye is drawn to that.

Skip to 5 minutes and 51 seconds But if it’s yellow versus blue versus red versus green in no ordered way, what’s the point? And in this case, it may be much better to use shades of grey or shades of blue or something that’s pretty easy to see in black and white. Even people who are color deficient can kind of see. Okay, add more data into your graphic. Make your graphic more data rich. There are a lot of ways of doing this. You know, sometimes we have a scatterplot. And we could easily put a symbol or an abbreviation denoting different types of observations, and that adds richness to the figure. You know, sometimes people put country observations. Sometimes you might be able to put symbols.

Skip to 6 minutes and 29 seconds Let’s say you add 500 people and they were different ages or different genders. You might be able to incorporate that into the figure. And if you abstract away from that you’ll still get the scatterplot you had before. But, you might see some interesting patterns. Others may notice patterns for some groups. It’s pretty easy to do, it makes your graphic much more data rich. So we should try to do that when possible.

Skip to 6 minutes and 52 seconds In some ways, as we bring more and more data, and data points have classifiers and other things, we create something that may look a little bit like a hybrid figure table. And maybe that’s kind of the ideal. So this is an example. A famous 1919 plot, where each data point here, every entry is the number of an army unit for the U.S. army that went to Europe during WW1. Month by month. You can immediately see a bunch of things. You can see the number of units that were in Europe in each month, that’s the height. You can see how long each unit was in Europe, that’s the sort of width of the figure.

Skip to 7 minutes and 32 seconds So, between when they show up and the end of the war. And you know which ones were there. So this is like an incredibly simple figure. It takes up very little space, it’s incredibly data rich and basically every item here is data.

Skip to 7 minutes and 47 seconds Replace the full access going to the origin and back out with the data range. So, let’s say the access goes from 0-100 but all my data is between 20-70, only plot the access from 20-70. Now, all of a sudden, you’ve taken something that had no data, and it is data, it tells you the max and the min. Like, how much did you lose by doing that? Actually you’ve gained? Now we can see the max and the min in the figure, everybody knows where they would intersect at 0. But, who cares? Like, that isn’t data, there’s nothing new there. That’s one way to do it.

Skip to 8 minutes and 22 seconds The other thing is, there may be simple ways of portraying the distribution of the data on the axes. Here’s a scatterplot, and this would be “x” and “y”. Whatever, we may care about that for some reason. But what he has here are tick marks wherever there’s data. So now I have the univariate distribution of “x” and the univariate distribution of “y”, and I can just see it visually. Kind of a very similar point, a related point, is to try to integrate graphics, data and text together. Kind of more broadly. You know, famous early scientists like Leonardo da Vinci, or artists scientists like Leonardo da Vinci, Galileo, etc. All their notes were littered with graphics, and they integrated them throughout.

Skip to 9 minutes and 5 seconds There would be writing and integrate what they were talking about, together. And there was no distinction between, here’s the text and here’s the figure, the way we often write our papers. The early scientists integrated these things and that was the best way to convey their ideas. And the question is, can we do this again? So here’s a couple of examples, and this is where I’m gonna get to some recent figures. So one thing that Tufte advances in this sort of second edition of his book is what he calls “sparklines.” But you know it doesn’t have to be called that or anything else necessarily. But this is sort of Word-size bits of data.

Skip to 9 minutes and 43 seconds He’s basically pushing for integrating little bits of data and text, or little bits of data that are sort of the same scale as a word. So very, kind of simple idea. But if I’m a physician, let’s say this is a screen, I get a lot of summary information here. I care about this patient’s glucose, breathing, their temperature. I get their current value, I get their last 12 hours here, and if the shaded area is the normal range, I get data that they’re out of the normal range. So this is incredibly tight, there’s a lot of data here in a very small amount of space. So this is a pretty effective display of data.

Skip to 10 minutes and 20 seconds Now, this may not be exactly what we put in our research papers all the time, but maybe it is? And you see some of this in some papers. You know, sometimes when people use little bite-size bits of data on a page, 50 plots together, that’s kind of in the same spirit as this. It’s really a way to show the data in a very transparent way and you may see patterns there that you didn’t realize existed. What too few of us do with our figures and our tables, is sit down critically with them and edit them the way we edit text. And that’s really weird because there are a lot of people who look at our tables and our figures.

Skip to 10 minutes and 56 seconds And a lot fewer people who actually carefully read every word of an article, at the end of the day. Especially with the rise of graphical abstracts, and with summary figures being the things that get circulated and that people look at. Summary tables and summary figures. We should be spending as much time iterating on and revising our tables and our figures as we do our text, but we don’t. We obsess over words and section 5.2. And spend an hour reworking a paragraph, but rarely, unfortunately, do many social scientists sit down for hours and obsess over every detail of their figures. Some people do, but a lot of people don’t.

Skip to 11 minutes and 32 seconds And the evidence that a lot of people don’t is, that in almost any seminar you’re in you can look at a figure and immediately think of 3 or 4 ways to improve it.

What should we do to improve our graphics and figures?

In this video I give six practical tips for improving graphics and figures. We’ll see some examples of figures – good, bad, and ugly – and also learn about some innovative ways to present data in ways that are useful and aesthetically pleasing to the reader, while maintaining the integrity and truth of the data represented. Even if you can’t access Tufte’s book, you can check out his website to explore different graphics and find ways to improve your own visualizations.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley