Skip main navigation

Plotting transformation using ggplot2

Advanced plotting options

Advanced plotting options: axis transformation

Have you noticed how the previous data we plotted seemed crushed? This is simply because in its default output, ggplot2 optimizes the axis coordinates to comprise all the data. If there are extreme values you consider as outliers and that you would like to ignore from the plotting, then the axis limits can be transformed, using either xlim() or ylim() to transform x and y axis limits respectively, or both.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + ylim(0,10000)

Warning message:
Removed 8 rows containing missing values (`geom_point()`).

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + ylim(0,10000)

Warning message:
Removed 8 rows containing missing values (`geom_point()`).

screenshot of two plots one with points and the other with box post both with x and y limits transformed so they don't look squashed

screenshot of two plots one with points and the other with box post both with x and y limits transformed so they don't look squashed

Another possibility to obtain this output is to use the functions scale_x_continuous() and scale_y_continuous(). You can try this option, adding a new y-axis title. Note the change in y-axis title name from “DP” to “dp”.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_continuous(name="dp", limits=c(0, 10000))

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_continuous(name="dp", limits=c(0, 10000))

screenshot of two plots one with points and the other with box post both with x and y limits transformed so they don't look squashed

screenshot of two plots one with points and the other with box post both with x and y limits transformed so they don't look squashed

To optimize axis display, another possible transformation is the log transformation. Here are examples of different possibilities to get there, using basic built-in R functions.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_continuous(trans='log10')

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_log10()

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_continuous(trans='log10')

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_log10()

screenshot of two plots one with points and the other with box post both with optimised x and y axis

screenshot of two plots one with points and the other with box post both with optimised x and y axis

Additional formatting options (breaks, labels, etc) of axis can be accessed through the “scales” package, which functions can easily be combined with ggplot2 formatting.

Advanced plotting options: colors, shapes, legend

1. Change colors

If you want to highlight the differences between each individual sample, colouring them can help.

# Colours of shapes
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, colour = SAMPLE)) + geom_boxplot() + ylim(0,10000)

# Colours for filling options
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000)

screenshot of two plots with boxplots both with optimised x and y axis and outlined and coloured boxes

screenshot of two plots with boxplots both with optimised x and y axis and outlined and coloured boxes

You probably noticed that so far colours were set by default. To change colours, you can do this manually, or using predefined palettes using the “RColorBrewer“ package. Please make sure you install and load the package before using it.

# Colours for filling options with manual colors
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_manual(values=c("#cb6015", "#e1ad01", "#6d0016", "#808000", "#4e3524"))

# Colours for filling options with preset palettes
> install.packages(“RColorBrewer”)
> library(RColorBrewer)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu")

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually not default coloured boxes

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually not default coloured boxes

All possible palettes can be displayed using:

> display.brewer.all()

RStudio colour pallete

2. Change legend position

Legend positions can be specified using themes(), with options “right”, “left”, “top” or “bottom”. If legend is to be removed, use “none” instead.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="top")

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="none")

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually, not default coloured boxes, with a legend on the top of the plot

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually, not default coloured boxes, with a legend on the top of the plot

3. Change plot and axis titles

Different options can allow us to do so. Options given by the labs() function or a combination of ggtitle(), xlab() and ylab() give the same possibilities and ease of code for display options.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="bottom") + labs(title="DP_per_Sample", x="SampleID", y = "DP")

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="bottom") + ggtitle("DP per Sample") + xlab("Sample") + ylab("DP")

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually, not default coloured boxes, with a legend and changed axis titles

screenshot of two plots with boxplots both with optimised x and y axis and outlined and manually, not default coloured boxes, with a legend and changed axis titles

4. Change shapes, colors, and sizes

Different changes can be operated at once on the dataset.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point(shape = 21, fill = "#e4dbc1", color = "#b92e17", size = 6) + ylim(0,10000)

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point(shape = 23, color = "#e4dbc1", fill = "#b92e17", size = 5, alpha=0.5) + ylim(0,10000)

screenshot of two plots with points for comparison on how different changes can be operated at once on the same dataset

screenshot of two plots with points for comparison on how different changes can be operated at once on the same dataset

All possible points can be displayed using:

> ggpubr::show_point_shapes()

plot of all possible points shapes in RStudio

Great, you did a good job in reaching this stage! So far, we have covered functions and options for data visualization, starting with two variables among the var_tb data, namely Samples and DP values, as examples. To further proceed with data exploration of the variants, let’s now consider other variables. In doing so, we will also be covering other interesting plotting options.

© Wellcome Connecting Science
This article is from the free online

Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now