Considering computer power in general is increasing yearly, being capable to harness this power can mean great advantage for you.

Reading, creating and storing the data in R

Introduction

In this article we will briefly describe the following basic data-management operations in R:

  • how to load the data from a file,
  • how to create the data,
  • how to store the data in a file.

R has a very good open-source society that has provided help for almost every need for the different platforms. We strongly suggest using this support when R-related data-management questions arise that are not covered as part of this course.

First thing first

Note that all these examples assume that you have started RStudio and you have opened a new R script file. If you have not, then by pressing ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Creating and storing the data

We will first demonstrate how to create a data matrix with 50 rows and 4 columns and later how to create a data frame using this matrix. We want our matrix to have the first 3 columns corresponding to continuous random variables and the 4th must be a discrete random variable having values in {1,2,3,4,5}.

# PLAYING EXAMPLE
X1 <- matrix(rnorm(150,0,1), ncol=3)              # RANDOM 50x3 MATRIX X
X2 <- matrix(round(runif(50,1,5),0),ncol=1)       # RANDOM 50x1 MATRIX (COLUMN)
X<-cbind(X1,X2)                                   # MERGING X1 AND X2 INTO MATRIX X
colnames(X)<-c("X1","X2","X3","GROUP")            # SETTING THE COLUMN LABELS

We can store X into file mymatrix.txt by

write.table(X, file="mymatrix.txt", row.names=FALSE, col.names=TRUE)

Sometimes you need to store the data as data frame - this is demanded, e.g., if your data will be passed to a particular R function that accepts only data frames as input or if you will add new columns with different data types. You can do it with

Xd = as.data.frame(X)
save(Xd, file = "mymatrixDF.RData")

If you prefer to save the data as csv and non-scientific (i.e., classic) format, you need first to load library MASS

library(MASS)

and then use

write.matrix(format(X, scientific=FALSE), 
             file = "mymatrix.csv", sep=",")

For more details about how to formulate or change the parameters of write.matrix or write.table or save, see the built-in help by calling, e.g., help(write.table).

Reading the data

You can read the data using several commands. For example, to read the data stored in the previous section use the following:

  • reading the txt format: XX=read.table(file="mymatrix.txt”, header = TRUE)
  • reading the Rdata file: load("mymatrix.RData")
  • reading the csv data file: read.csv(file = "mymatrix.csv").

If the data format is not standard (e.g., has some specifics related to the deliminators, encodings, header line, missing values, classes for the variables, etc.) you can cope with these specifics by setting appropriate values for the different parameters in the read.table(). See, e.g., the proposed manuals at the end of this article.

Share this article:

This article is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)