Skip main navigation

Reading, creating and storing the data in R

We describe how to read data in standard txt or csv format and how to generate random data that forms groups with little overlap.
Considering computer power in general is increasing yearly, being capable to harness this power can mean great advantage for you.
© PRACE and University of Ljubljana

Introduction

In this article we will briefly describe the following basic data-management operations in R:

  • how to load the data from a file,
  • how to create the data,
  • how to store the data in a file.

R has a very good open-source society that has provided help for almost every need for the different platforms. We strongly suggest using this support when R-related data-management questions arise that are not covered as part of this course.

First thing first

Note that all these examples assume that you have started RStudio by executing the following command in the terminal:

$ rstudio &

and you have opened a new R script file.
If you have not, then by pressing ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Creating and storing the data

We will first demonstrate how to create a data matrix with 50 rows and 4 columns and later how to create a data frame using this matrix. We want our matrix to have the first 3 columns corresponding to continuous random variables and the 4th must be a discrete random variable having values in {1,2,3,4,5}.

# PLAYING EXAMPLE
X1 <- matrix(rnorm(150,0,1), ncol=3) # RANDOM 50x3 MATRIX X
X2 <- matrix(round(runif(50,1,5),0),ncol=1) # RANDOM 50x1 MATRIX (COLUMN)
X<-cbind(X1,X2) # MERGING X1 AND X2 INTO MATRIX X
colnames(X)<-c("X1","X2","X3","GROUP") # SETTING THE COLUMN LABELS

A side note: rnorm(150,0,1) generates 150 random values with a normal distribution with a mean of 0 and a standard deviation of 1, while round(runif(50,1,5),0) generates 50 random values with a uniform distribution from 1 to 5; round(x,0) rounds the value of x to the nearest integer.

We can store X into file mymatrix.txt by

write.table(X, file="mymatrix.txt", row.names=FALSE, col.names=TRUE)

Sometimes you need to store the data as data frame – this is demanded, e.g., if your data will be passed to a particular R function that accepts only data frames as input or if you will add new columns with different data types. You can do it with

Xd = as.data.frame(X)
save(Xd, file = "mymatrixDF.RData")

If you prefer to save the data as csv and non-scientific (i.e., classic) format, you need first to load library MASS

library(MASS)

and then use

write.matrix(format(X, scientific=FALSE), 
file = "mymatrix.csv", sep=",")

For more details about how to formulate or change the parameters of write.matrix or write.table or save, see the built-in help by calling, e.g., help(write.table).

Reading the data

You can read the data using several commands. For example, to read the data stored in the previous section use the following:

  • reading the txt format:
    XX=read.table(file="mymatrix.txt", header = TRUE)
  • reading the Rdata file:
    load("mymatrixDF.RData")
  • reading the csv data file:
    read.csv(file = "mymatrix.csv")

If the data format is not standard (e.g., has some specifics related to the deliminators, encodings, header line, missing values, classes for the variables, etc.) you can cope with these specifics by setting appropriate values for the different parameters in the read.table(). See, e.g., the proposed manuals at the end of this article.

Copy/Paste issues

If you run into an error like this:

Error: unexpected symbol in:
"XX=read.table(file="mymatrix.txt,header=TRUE)

then RStudio probably doesn’t understand every symbol in the code correctly. This issue can arise when you copy/paste the examples from the articles. You can resolve it by entering the symbols from the keyboard instead of copy/paste them.

© PRACE and University of Ljubljana
This article is from the free online

Managing Big Data with R and Hadoop

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education