Considering computer power in general is increasing yearly, being capable to harness this power can mean great advantage for you.

Reading, creating and storing the data in R


In this article we will briefly recall how to load the data from a file, how to create the data and how to store the data into file. R has very good open source society which has provided within different platforms help for almost any need. We strongly suggest to start using this support when R related data management questions arise that are not covered within this course.

First thing first

Note that all these examples assume that you started RStudio and you are typing (copying) the R code into a R script file (ctr+shift+n). Once you type (copy) the code, you run it by e.g. selecting the part of the code you want to run and typing ctrl+enter.

Creating and storing the data

We will first demonstrate how to create a data matrix with 50 rows and 4 columns and later how to create a data frame using this matrix. We want our matrix to have first 3 columns corresponding to continuous random variables and the 4th must be a discrete variable having values in {1,2,3,4,5}.

X1 <- matrix(rnorm(150,0,1), ncol=3)              # RANDOM 50x3 MATRIX X
X2 <- matrix(round(runif(50,1,5),0),ncol=1)       # RANDOM 50x1 MATRIX (COLUMN)
X<-cbind(X1,X2)                                   # MERGING X1 AND X2 INTO MATRIX X
colnames(X)<-c("X1","X2","X3","GROUP")            # COLUMN LABELS
write.table(X, file="mymatrix.txt", row.names=FALSE, col.names=TRUE)

Sometimes you need to store the data as data frame - this is demanded e.g. if your data will be passed to particular R function that accepts only data frames as input. You can do it with

Xd =
save(Xd, file = "mymatrixDF.RData")

If you prefer to save data as csv and non-scientific (i.e., classical) format, use

write.matrix(format(X, scientific=FALSE), 
             file = "mymatrix.csv", sep=",")

For more details how to formulate or change parameters of write.matrix or write-table or save, see many online manuals or answers provided at stackoverflow.

Reading the data

You can read the data using several command. For example, for reading the data stored in in the previous section use the following:

  • reading the txt format: XX=read.table(file=”mymatrix.txt”, header = TRUE)`
  • reading the Rdata file: load("mymatrix.RData")
  • reading the csv data file: read.csv(file = "mymatrix.csv")

If the data format is not standard (e.g., has some specifics related to the deliminators, encodings, header line, missing values, classes for the variables etc.) you can handle these specifics by setting appropriate values for different parameters in read.table(). See e.g. proposed manuals at the end of this article.

Share this article:

This article is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)