Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. T&Cs apply

Install and load the packages to use

guide to installing R packages to be used in they week learning
hand working at a computer keyboard with a screen in front

In order to be able to use RStudio for data exploration, manipulation, and visualization, you need to choose the appropriate packages that will help you do so

There are several packages that can do this in RStudio. Let’s see first how a package can be installed and used in RStudio, then we will apply this for the packages we have selected for the course.

How to install and/or load a package in RStudio

1. Installing a package

If you want to use a package for the first time, you need to install it first.

To install a package in your RStudio session, you can either:

  • Option 1: via the RStudio Console, use the install.packages() function, with the name of the package double quoted between brackets.
  • Option 2: via the GUI, go to the Packages Tab, click on the Install option, type the name of the package you want to install and wait for the installation to complete.

Option 2 will simply allow you to automate the operation performed by the install.packages() function from Option 1. For the rest of this course, we will always install packages using Option 1.

NB: You will need to do this only once.

NB: Once installed, the package will appear in the list of packages under the Packages Tab.

2. Loading a package

Installing a package means you have made it available in your RStudio environment. But this doesn’t mean you asked RStudio to use it. For this, you need to inform RStudio to load the package.

To load a package in your RStudio session, you can either:

  • Option 1: via the RStudio Console, use the library() function, with the name of the package (with brackets)
  • Option 2: via the GUI, go to the Packages Tab, search for the installed package you want to load, and select it.

Again, Option 2 will simply automate the operation performed by the library() function from Option 1. For the rest of this course, we will always load packages using Option 1.

NB: You will need to do this each time you open a new R session and want to use this package.

NB: Once loaded, the package will appear as selected in the list of packages under the Packages Tab.

Installing and loading relevant packages

One of the most popular packages used to explore data is tidyverse, as it encompasses many different packages useful for data manipulation, query, and visual display.

What is “tidyverse”?

The tidyverse package is generally presented as an “umbrella-package”, because it has the advantage of installing many useful tools for data analysis and visualization at once, such as “tidyr”, “dplyr”, and “ggplot2”, that we are going to use here.

What is “tidyr”?

Ensuring your data is tidy means your data follows a standard tabular way of storing the data, where each column represents a variable, each row is an observation, and consequently each cell represents one single entry. The whole conceptual idea behind this package is to help you create or reshape tidy data. This will help you spend less time struggling with functions in the later stages of data analysis.

What is “dplyr”?

This is a powerful R package for data manipulation on tabular data with rows and columns. It includes functions such as selection of specific columns, filtering and ordering rows, adding columns, or summarizing data. This allows the package to be fully operational to perform tasks under the “split-apply-combine” concept, in which you break up your data into manageable pieces, operate functions on them, and then combine them again. If you compare the functions in “dplyr” to similar base functions in R, you will find “dplyr” to be easier to use with a clear syntax and targeted for analyzing data in data frames, not only in vectors.

What is “tibble”?

A tibble is a simplified version of a data frame. While both are preferred data structures for many packages such as “tidyverse”, they each have their strengths and weaknesses. To simplify, tibbles contain the same information as data frames, but they ease the processes of manipulation and display.

What is “ggplot2”?

Based on the concept of “Grammar of Graphics”, this very popular and versatile package allows users to visually represent data by adding layers of information. For this, after providing the tabular data itself as a basis, preferences for data visualization can be added in layers. These include aesthetics, geometries, facets, statistics, coordinates, or themes.

Now that you know what to expect from these packages and how they can help you inspect, manipulate and/or visualize your data, let’s install the package “tidyverse” to have them all at once. To install and load “tidyverse”, please follow the instructions below.

1. Install “tidyverse”

install.packages("tidyverse")

2. Load “tidyverse”

library(tidyverse)

How is it going so far? Do share your thoughts with your fellow learners in the comments section below.

© Wellcome Connecting Science
This article is from the free online

Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now