Skip main navigation

Graphical summaries of data

What is Exploratory Data Analysis (EDA)?

In the next two steps, we briefly review some basic concepts and tools for summarising the data both graphically and numerically. Such summaries, referred to as descriptive statistics, belong to the toolkit of Exploratory Data Analysis (EDA).

Graphical and numerical summaries are illustrated in a variety of data examples. We leverage R for statistical computation but technical details (including the R code) are skipped for better flow and clarity, and deferred to the subsequent reading titled Using RStudio for numerical and graphical summaries.

In this first step, we focus on graphical summaries, which help visualise the data. The step is divided into two parts centred around categorical and quantitative types of data, respectively. 

1. Categorical data

1.1. Data example

In this section, we re-use the shark attack example described in the previous reading. Note that in all the images below, the designation of colours is consistent across the different types of plots and charts.

Example 1: Shark attacks 

Source: Agresti, A., Franklin, C., Klingenberg, B. 2023. Statistics: The Art and Science of Learning from Data, Pearson. p. 65.

Table 1: Shark attacks in the USA in 2004-2013

US State Frequency Percentage
California 33 8.5%
Florida 203 52.5%
Hawaii 51 13.2%
North Carolina 23 5.9%
South Carolina 34 8.8%
Texas 16 4.1%
Other 27 7.0%
Total 387 100%

We now proceed to demonstrating how the graphical tools can be used to visualise this data.

1.2. Bar plots

Categorical data with a relatively small number of categories (like our dataset) can be conveniently visualised using a bar plot (also called a bar chart or bar graph), with the heights of the bars indicating the observed frequencies of different categories.

A bar plot for our data would look like the following image. Note that the highest bar corresponds to Florida, with the largest number of attacks, 52.5%.

A bar chart visualisation of the shark attack data. A total of 387 shark attacks were reported in the USA between 2004 and 2013. The table shows the breakdown by state, together with percentages.

 1.3. Pareto charts

The same data on shark attacks can be even clearer to interpret using a Pareto chart, which is similar to a bar chart, except the bars are arranged in decreasing order of height, as shown in the next image.

A Pareto chart visualisation of the shark attack data.
 1.4. Pie charts

An alternative graphical tool is the pie chart, where the frequencies of the categories are shown using sectors of proportional area (see the following image). Again, the largest sector corresponds to Florida; moreover, we clearly see that it represents more than half of all shark attacks.

The advantage of pie charts is that they clearly visualise the proportions of different categories, which may be harder to apprehend from the bar plots.

2. Quantitative data

2.1. Data example

To illustrate graphical summaries for quantitative data, we introduce a new dataset.

Example 2: salary 

Source: Ross, S. 2021. Introduction to Probability and Statistics for Engineers and Scientists, 6th ed., Elsevier/Academic Press. p. 12.

The following frequency table shows the annual salaries (to the nearest thousand dollars) of 42 recently-graduated students with BSc degrees in electrical engineering.

Table 2: Annual salaries of BSc graduates 

Salary ($1,000s) Frequency Cumulative Frequency
57 4 4
58 1 5
59 3 8
60 5 13
61 8 21
62 10 31
64 5 36
66 2 38
67 3 41
70 1 42

2.2. Dot plots

Numerical data can be represented using dot plots. These are constructed by assigning a graphical dot to each observed value; ties are shown by stacking the dots.

The dot plot for the data in Example 2 is shown in the following image.

A dot plot visualising the graduate salaries data. Please refer to the data in Table 2: Annual salaries of BSc graduates.

If the stacks of dots are given thickness, we obtain a bar plot that you should be familiar with from the first section of this reading. Note that unused values such as 63, 65, etc, are skipped in the bar plot.
A bar chart visualising the graduate salaries data. Please refer to the data in Table 2: Annual salaries of BSc graduates.

2.3. Histograms

A common flexible way of visualising quantitative (especially continuous) data is by histograms. The following image is a histogram of the salary data in Example 2.

A histogram visualising the graduate salaries data. Please refer to Table 2: Annual salaries of BSc graduates above.

The rectangular bar over an interval (‘bin’) has a height equal to the number of observations in this interval. ‘Frequencies’ here are the actual counts. The total area of the bars equals the size of the sample, 42.

Alternatively, relative frequencies may be used (also referred to as density), and the resulting plot is shown in the next image.

A histogram visualising the graduate salaries data using relative frequencies. The y-labels differ from previous histogram.

Here, the height of each bar over a bin equals the proportion of the counts in the bin, out of the total number of observations 42. Hence, the total area of the bars in the density histogram equals 1. The shape of the density histogram is the same as before, only the y-labels have changed.

2.4. Cumulative frequency plots

For quantitative data, cumulative frequency plots (also called ogives) are useful to appreciate the proportions of the data below (or above) chosen values. The next image shows the salary data as a cumulative frequency plot. From this plot, we see that 31 out of 42 graduates have a salary of $62,000 or less.

A cumulative frequency plot visualising the graduate salaries data. Please refer to the data in Table 2 column 3 above.

The following image shows a similar plot for the relative frequencies called a cumulative density plot. Note the change in the vertical labels in this plot. This plot shows that about 73.8% of people in this sample have a salary of $62,000 or less. 

Next steps

This step summarised graphical summary charts and plots of two example datasets. In the next step, you change to learning about common numerical summaries. Again, focusing on statistical methods for now, then learning the corresponding R commands to produce data summaries in a subsequent step.

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now