Skip main navigation

Making data visualisations in RStudio

This article will guide you through isualization with ggplot2 u- Setting Data, Aesthetics and Geometries
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
Making data visualisations in RStudio.

Data Visualisation with ggplot2

Introduction
Let’s load our data set of interest, install and load all the packages we need, and start making data visualizations in RStudio.
Good practice
A good practice is to load the packages you need before starting your analysis. It is also recommended to write the packages you need in the script you prepare for a project. This is a list of convenient packages to use with ggplot2
> library(ggplot2)
> library(RColorBrewer)
> install.packages("viridis")
> library(viridis)
 
Note. Another widely used package in data science is called tidyverse, and is a collection of packages including ggplot2, dplyr and many other helpful resources. It can be worth trying to use it on your own after this course.

4.8

143 Reviews
 

Setting your working directory in RStudio

 
Step 1. We recommend you to work in the Project folder Project_Test that we created previously, either by clicking directly on the Project_Test or using the following command
 
> setwd("/Users/imac/Desktop/exerciseR/Project_Test")
> getwd()
[1] "/Users/imac/Desktop/exerciseR/Project_Test"
 
Step 2. As a reminder, you can create a specific script file to write your commands and related comments.
 

Setting your data

 
Step 1. Import or load the iris dataset we want you to work on in RStudio. All options are identically accessed in R, but the two final options are particular to RStudio.
 
 
    • From your computer, if you placed the iris dataset file in your working directory
 
 
> Iris <- read.table("iris.txt")
 
 
    • From your computer, if the iris dataset file is in the parent folder exerciseR
 
 
> Iris <- read.table("/Users/imac/Desktop/exerciseR/iris.txt")
 
 
    • From the available data sets in R
 
 
> data(iris)
 
 
    • From the “Import Dataset” tab in the Environment, by selecting the correct file with its type and parent folder.
       
 
    • From the File menu, by choosing the “Import Dataset” option.
       
 
 
Note 1. Be careful to choose the “iris” dataset as “Iris” would here correspond to the same data set but with changes that could impede the rest of the commands.
 
Note 2. As other functions in R, the “read.table()” function has different options that you can view in the following link, which also shows you other functions used to import data from other file formats (for example with the “read.csv()” function to read “.csv” files). https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table.
 
Step 2. You can also display and work on specific data chosen from the iris data set
 
> iris_length <- iris %>% select(Sepal.Length, Petal.Length)
> head(iris_length)Sepal.Length Petal.Length
1 5.1 1.4
2 4.9 1.4
3 4.7 1.3
4 4.6 1.5
5 5.0 1.4
6 5.4 1.7
 

Setting Aesthetics and Geometries

 
Step 1. Let’s use basic layers to plot Petal.Length vs. Sepal.Length. With ggplot2, “aes()” specifies aesthetics for x and y-axis, and “geom_point()” generates a scatterplot
 
> ggplot(data = iris,aes(x = Sepal.Length, y = Petal.Length)) +geom_point()
 
Note 1. Here is an example of how you should see the output in your “Plots” area in RStudio. Note that you have the possibility to save your plot using the “Export” button, with options related to file formats. Other R options we saw for saving plots remain possible.
 
Note 2. We will not show the whole area again, but remember that the plots you generate will appear here.
 
Step 2. Using the same previous plot options, let’s color the points according to the Species
 
> ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point()
 
scattergraph with different species in different colours
 
Step 3. There are other possible shorter ways for generating this same output
 
> ggplot(iris, aes(Sepal.Length, Petal.Length, color = Species)) +geom_point()
 
Note. However, for the sake of clarity, we will mainly keep the full details such when using data, x and y to ease the understanding
 
scattergraph mono colour species
 
Step 4. It is possible to create a variable with your base aesthetics and then simply call it to apply other layers. The following will create the same output as the previous graph
 
> key <- ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length, ,color = Species))
> key + geom_point()
 
Step 5. Different geometries can also be used to complement each other. Here “geom_smooth()” adds a trend line and area to the points
 
> key + geom_point() + geom_smooth()
 
scattergraph with trend line
 
Step 6. You should have noticed how geometries are here added with default options. Each has a set of options, such as removing the trend area in the following with se=FALSE
 
> key + geom_point() + geom_smooth(se=FALSE)
 
scattergraph with smooth trends
 
Step 7. You can easily change the points size, shape and colour from “geom_point()” options, but see how it affects the display: if you force one colour, you will not have any more colors by Species, even if they are required in the key variable
 
> key + geom_point(size=4, shape=15, color="red3")
 
scatter graph with red dots
 
Step 8. Or the size, shape and color as dependent now on Sepal.Length values from aes
 
> ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length,color = Sepal.Length, size = Sepal.Length)) + geom_point()
 
scatter graph with blue dots
 
Note. We used here the default ggplot2 colors, but we will see later on how to use other color palettes
 

Other Functions and Plots

 
Step 1. Remember that we are only covering here the “ggplot()” usage, but other possibilities exist to generate the same output as in Step 6 of this Article, such as “qplot()” which is used to generate quick plots with ggplot2
 
> qplot(Sepal.Length, Petal.Length, data = iris, color =factor(Species)) +geom_point() +geom_smooth(se=FALSE)
 
Step 2. Generating different plots will require different geometries
 
 
    • Boxplot with default options
 
 
> ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length,color = Species)) + geom_boxplot()
 
box plot with default options
 
 
    • Bar plot with default options
 
 
> ggplot(data=Iris,aes(x=Sepal.Length)) + geom_bar()
 
bar plot with default options
 
 
    • Or more complex ones even with default options such as Density plot
 
 
> ggplot(data=Iris,aes(x=Sepal.Length, y = Petal.Length)) +geom_density_2d_filled()
 
density plot with default options
 
Step 3. An important thing to remember is that each plotting functions comes with its own set of option, that might not work for other functions. Let’s see how to generate and modify histograms
 
 
    • Default options
 
 
> ggplot(data=Iris,aes(x=Sepal.Length)) + geom_histogram()
 
black and white histogram
 
 
    • Filling histogram colurs by Species. Note how calling the colour option is different here
 
 
> ggplot(data=Iris, aes(x=Sepal.Length,fill=Species)) +geom_histogram()
 
colourful histogram
 
 
    • Use binwidth option with histograms
 
 
> ggplot(data=Iris,aes(x=Sepal.Length,fill=Species)) +geom_histogram(binwidth = 0.05)
binwidth histogram in colours
Note. A wide range of different plots can be generated with ggplot2 such as Bar plots, Boxplots, Violin Plots, Density Plots, Area Charts, Correlograms…and many many more !
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education

close