R code for graphical summaries
This step continues from the content of the video Demonstration: Basics of RStudio for data summaries. In it, you will learn R codes for creating graphical summaries.
This section covers bar plots, Pareto charts, pie charts, dot plots, cumulative plots, histograms and normal approximation.
The R code blocks shown below are not interactive. You can copy and paste them into one of the RStudio to execute them and see the results.
Download the zip folder ‘Using RStudio for graphical and numerical summaries’ (the R file version of the reading) in the ‘Downloads section’ of the previous step.
First, consider visualisation tools for categorical data, taking, as a primary example, the women’s clothing size dataset.
1 Bar plots
The R command to produce a bar plot is just barplot()
:

2 Pareto charts
Pareto charts are bar plots with bars arranged in decreasing order of height.
The following R code (based on the order()
command) can be used to produce a Pareto chart:
# Create Pareto chart for women's clothing size data
# First, define colours of the size bars:
colour <- c(rainbow(11))
# Create a new data frame to include colours: fdf <- data.frame(f, colour)
# Re-order in decreasing frequencies freqed <- order(fdf$freq, decreasing=TRUE)
# Now plot Pareto chart: barplot(fdf$freq[freqed], col=fdf$colour[freqed], names.arg=f$size[freqed], cex.names=0.95, main="Pareto chart of women's clothing size data")
A simpler code is based on the package dplyr
:
If you want to install dplyr. Type ininstall.packages("tidyverse")
# Alternatively, install just dplyr:install.packages("dplyr")
If you want to find out more go to https://dplyr.tidyverse.org/ which has more instructions.
library(dplyr) # call package "dplyr"
fdfed <- arrange(fdf,desc(fdf$freq)) # re-order in descending frequencies
# Now plot Pareto chart:
barplot(fdfed$freq,col=c(fdfed$colour),names.arg=fdfed$size, cex.names=0.95,
main="Pareto chart of of women's clothing size data")
Try it and you’ll see that this code produces the same Pareto chart as before!
3 Pie charts
The R command to produce a pie chart is simply pie():
# Create pie chart for the women's clothing size data:
pie(f$freq, labels=f$size, col=rainbow(11), radius=1.0,
main="Pie chart of women's clothing size data")
The pie chart shows proportions, e.g, we see that sizes 12, 14 and 16, shown in shades of green, take more than a half in this sample.
4 Dot plots
Dot plots are produced using the R command stripchart()
:
# Create dot plot for the women's clothing size data:
stripchart(f1, col="blue", method = "stack", pch=16, ylim=c(1,30), xlim=c(6,26),
offset=0.5, main = "Dot plot of women's clothing size data",
xlab="Size", ylab="Frequency", axes=FALSE)
# Add axes ticks and labels
axis(1, at=2*c(3:13)) # to add new labels
axis(2, at=5*c(0:7))
Let us now consider R coding to produce graphical summaries for quantitative data.
Some of the previous tools are applicable here as well. For instance, for the salary dataset, we can construct the following graphical summaries:
5 Bar plot:
# Recall the data set:
salary <- c(57:62,64,66,67,70)
frequency <- c(4,1,3,5,8,10,5,2,3,1)
# Bar plot of the salary data:
barplot(frequency, names.arg=salary, col="blue", xlab="Salary ($1000s)", ylab="Frequency", main="Bar plot of the salary data")
6 Dot plot:
sal_list <- rep(salary, frequency)
# Create dot plot of the salary data:
stripchart(sal_list, method="stack", pch=16, offset=1.45, xlim=c(56,70), ylim=c(1,11), col="blue", xlab="Salary ($1000s)", ylab="Frequency", main="Dot plot of the salary data")
axis(2, at=2*c(0:5))
7 Cumulative frequency plots
Cumulative frequency plots (also called ogives) are produced using the command cumsum()
to calculate partial sums of frequencies, and the plotting commands plot()
and points()
:
# Create cumulative data:
cumfreq <- cumsum(frequency)
# Create cumulative frequency plot:
plot(salary, cumfreq, type="l", col="blue", xlim=c(56,70), xlab="Salary ($1000s)",
ylab="Frequency", main="Cumulative frequency plot of the salary data")
points(salary,cumfreq, pch=19, col ="blue")
# Drawing dashed lines:
f62 <- cumfreq[salary==62]
segments(x0=0, x1=62, y0=f62, y1=f62, lty=2, lwd=2, col="red")
segments(x0=62, x1=62, y0=0, y1=f62, lty=2, lwd=2, col="red")
A similar plot for the relative frequencies (called a cumulative density plot) is produced as follows:
cumfreq <- cumsum(frequency)
total <- sum(frequency)
# Create cumulative density plot:
plot(salary, cumfreq/total, type="l", col="blue", xlim=c(56,70),
xlab="Salary ($1000s)", ylab="Density",
main="Cumulative density plot of the salary data")
points(salary,cumfreq/total, pch=19, col ="blue")
# Drawing dashed lines:
segments(x0=0, x1=62, y0=f62/total, y1=f62/total, lty=2, lwd=2, col="red")
segments(x0=62, x1=62, y0=0, y1=f62/total, lty=2, lwd=2, col="red")
8 Histograms
To plot a histogram of a quantitative dataset, we use the R command hist()
:
# Create histogram of the salary data data:
hist(sal_list, xlab="Salary ($1000s)", ylab="Frequency", col="grey",
main="Histogram of the salary data")
In this example, the ‘bins’ are chosen of length 2, from 56 to 70.
By a standard convention, the right boundary of the bins is included, while the left one is not, e.g, (56,58], (58,60], and so on. This can be reverted if desired by using the option right=FALSE
:
hist(sal_list, col="grey", xlim=c(56,70), ylim=c(0,15), xlab="Salary ($1000s)",
ylab="Frequency", right=FALSE,
main="Histogram of the salary data with redefined bins")
If a more granular representation is needed, the number of breaks separating the bins and their locations can be specified explicitly, for example:
# Plot histogram with more breaks:
hist(sal_list, col="grey", breaks=10, xlim=c(56,70), xlab="Salary ($1000s)",
ylab="Frequency", main="Refined histogram of the salary data")
This histogram shows clearly the values observed (or not observed) in the sample.
On the other hand, it is a bit more fragmented, whereas the previous histogram (with fewer bins) seems to capture better the overall shape of the salary distribution.
Histograms may also be plotted with respect to relative frequencies (also referred to as density); this is done by using the option freq=FALSE
:
# Plot histogram with relative frequencies:
hist(sal_list, freq=FALSE, col="grey", #breaks=10, xlab="Salary ($1000s)",
ylab="Density", xlim=c(56,70), ylim=c(0,0.24),
main="Density histogram of the salary data")
The shape of the density histogram is exactly the same as before, only the y labels have changed.
9 Normal approximation
Normal approximation of nearly symmetric, bell-shaped data is based on a theoretical curve, referred to as ‘normal probability density function’ or just ‘normal density’.
It has two parameters, mean
and sd
, and can be added on top of a histogram plot using the commands lines(x<-seq(), y=...)
and dnorm(x,mean,sd)
.
To illustrate, recall the data about student heights already used previously. Let us focus on heights of females (again, excluding an outlier of 92):
height <- read.csv( "https://raw.githubusercontent.com/artofstat/data/master/Chapter2/heights.csv")
NewF <- subset(HEIGHT, GENDER=="Female" & HEIGHT != 92)
This is the histogram of the data, together with a normal approximation (already plotted in 2.5 Shape of data distribution‘):
hist(NewF, freq=FALSE, xlab="Height (inches)",
main="Histogram of female student height data", breaks=17, xlim=c(55.5,79.1))
lines(x<-seq(55,80,0.01), y=dnorm(x, mean(NewF), sd(NewF)), col="red", lwd=2)
Next steps
This step gave you a detailed explanation of the R commands you need to create data summaries in RStudio. This completes Activity 2 in Week 2. When you are ready, move on to Activity 3 and read the overview to understand the activities you will complete.
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free