# Plotting transformation using ggplot2

Have you noticed how the previous data we plotted seemed crushed? This is simply because in its default output, ggplot2 optimizes the axis coordinates to comprise all the data. If there are extreme values you consider as outliers and that you would like to ignore from the plotting, then the axis limits can be transformed, using either xlim() or ylim() to transform x and y axis limits respectively, or both.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + ylim(0,10000)

Warning message:
Removed 8 rows containing missing values (geom_point()).

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + ylim(0,10000)

Warning message:
Removed 8 rows containing missing values (geom_point()).

Another possibility to obtain this output is to use the functions scale_x_continuous() and scale_y_continuous(). You can try this option, adding a new y-axis title. Note the change in y-axis title name from “DP” to “dp”.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_continuous(name="dp", limits=c(0, 10000))

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_continuous(name="dp", limits=c(0, 10000))

To optimize axis display, another possible transformation is the log transformation. Here are examples of different possibilities to get there, using basic built-in R functions.

# Points (left-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_continuous(trans='log10')

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point() + scale_y_log10()

# Boxplot (right-hand plot)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_continuous(trans='log10')

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_boxplot() + scale_y_log10()

Additional formatting options (breaks, labels, etc) of axis can be accessed through the “scales” package, which functions can easily be combined with ggplot2 formatting.

### Advanced plotting options: colors, shapes, legend

1. Change colors

If you want to highlight the differences between each individual sample, colouring them can help.

# Colours of shapes
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, colour = SAMPLE)) + geom_boxplot() + ylim(0,10000)

# Colours for filling options
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000)

You probably noticed that so far colours were set by default. To change colours, you can do this manually, or using predefined palettes using the “RColorBrewer“ package. Please make sure you install and load the package before using it.

# Colours for filling options with manual colors
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_manual(values=c("#cb6015", "#e1ad01", "#6d0016", "#808000", "#4e3524"))

# Colours for filling options with preset palettes
> install.packages(“RColorBrewer”)
> library(RColorBrewer)
> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu")

All possible palettes can be displayed using:

> display.brewer.all()

2. Change legend position

Legend positions can be specified using themes(), with options “right”, “left”, “top” or “bottom”. If legend is to be removed, use “none” instead.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="top")

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="none")

3. Change plot and axis titles

Different options can allow us to do so. Options given by the labs() function or a combination of ggtitle(), xlab() and ylab() give the same possibilities and ease of code for display options.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="bottom") + labs(title="DP_per_Sample", x="SampleID", y = "DP")

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP, fill= SAMPLE)) + geom_boxplot() + ylim(0,10000) + scale_fill_brewer(palette="RdYlBu") + theme(legend.position="bottom") + ggtitle("DP per Sample") + xlab("Sample") + ylab("DP")

4. Change shapes, colors, and sizes

Different changes can be operated at once on the dataset.

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point(shape = 21, fill = "#e4dbc1", color = "#b92e17", size = 6) + ylim(0,10000)

> ggplot(data = var_tb, aes(x=SAMPLE, y=DP)) + geom_point(shape = 23, color = "#e4dbc1", fill = "#b92e17", size = 5, alpha=0.5) + ylim(0,10000)

All possible points can be displayed using:

> ggpubr::show_point_shapes()

Great, you did a good job in reaching this stage! So far, we have covered functions and options for data visualization, starting with two variables among the var_tb data, namely Samples and DP values, as examples. To further proceed with data exploration of the variants, let’s now consider other variables. In doing so, we will also be covering other interesting plotting options.