
© Shutterstock
Reading, creating and storing the data in R
Introduction
In this article we will briefly describe the following basic data-management operations in R
:
- how to load the data from a file,
- how to create the data,
- how to store the data in a file.
R
has a very good open-source society that has provided help for almost every need for the different platforms. We strongly suggest using this support when R
-related data-management questions arise that are not covered as part of this course.
First thing first
Note that all these examples assume that you have started RStudio
and you have opened a new R
script file.
If you have not, then by pressing ctrl+shift+n
you start a new script file that you have to save first to a local folder. Once you type (copy) the R
code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter
.
Creating and storing the data
We will first demonstrate how to create a data matrix with 50 rows and 4 columns and later how to create a data frame using this matrix. We want our matrix to have the first 3 columns corresponding to continuous random variables and the 4th must be a discrete random variable having values in {1,2,3,4,5}
.
# PLAYING EXAMPLE
X1 <- matrix(rnorm(150,0,1), ncol=3) # RANDOM 50x3 MATRIX X
X2 <- matrix(round(runif(50,1,5),0),ncol=1) # RANDOM 50x1 MATRIX (COLUMN)
X<-cbind(X1,X2) # MERGING X1 AND X2 INTO MATRIX X
colnames(X)<-c("X1","X2","X3","GROUP") # SETTING THE COLUMN LABELS
We can store X
into file mymatrix.txt
by
write.table(X, file="mymatrix.txt", row.names=FALSE, col.names=TRUE)
Sometimes you need to store the data as data frame - this is demanded, e.g., if your data will be passed to a particular R
function that accepts only data frames as input or if you will add new columns with different data types. You can do it with
Xd = as.data.frame(X)
save(Xd, file = "mymatrixDF.RData")
If you prefer to save the data as csv
and non-scientific (i.e., classic) format, you need first to load library MASS
library(MASS)
and then use
write.matrix(format(X, scientific=FALSE),
file = "mymatrix.csv", sep=",")
For more details about how to formulate or change the parameters of write.matrix
or write.table
or save
, see the built-in help by calling, e.g., help(write.table)
.
Reading the data
You can read the data using several commands. For example, to read the data stored in the previous section use the following:
- reading the
txt
format:XX=read.table(file="mymatrix.txt”, header = TRUE)
- reading the
Rdata
file:load("mymatrix.RData")
- reading the
csv
data file:read.csv(file = "mymatrix.csv")
.
If the data format is not standard (e.g., has some specifics related to the deliminators, encodings, header line, missing values, classes for the variables, etc.) you can cope with these specifics by setting appropriate values for the different parameters in the read.table()
. See, e.g., the proposed manuals at the end of this article.
© PRACE and University of Ljubljana