Reading, creating and storing the data in R
In this article we will briefly describe the following basic data-management operations in
- how to load the data from a file,
- how to create the data,
- how to store the data in a file.
R has a very good open-source society that has provided help for almost every need for the different platforms. We strongly suggest using this support when
R-related data-management questions arise that are not covered as part of this course.
First thing first
Note that all these examples assume that you have started
RStudio by executing the following command in the terminal:
$ rstudio &
and you have opened a new
R script file.
If you have not, then by pressing
ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the
R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing
Creating and storing the data
We will first demonstrate how to create a data matrix with 50 rows and 4 columns and later how to create a data frame using this matrix. We want our matrix to have the first 3 columns corresponding to continuous random variables and the 4th must be a discrete random variable having values in
# PLAYING EXAMPLE X1 <- matrix(rnorm(150,0,1), ncol=3) # RANDOM 50x3 MATRIX X X2 <- matrix(round(runif(50,1,5),0),ncol=1) # RANDOM 50x1 MATRIX (COLUMN) X<-cbind(X1,X2) # MERGING X1 AND X2 INTO MATRIX X colnames(X)<-c("X1","X2","X3","GROUP") # SETTING THE COLUMN LABELS
A side note:
rnorm(150,0,1) generates 150 random values with a normal distribution with a mean of 0 and a standard deviation of 1, while
round(runif(50,1,5),0) generates 50 random values with a uniform distribution from 1 to 5;
round(x,0) rounds the value of
x to the nearest integer.
We can store
X into file
write.table(X, file="mymatrix.txt", row.names=FALSE, col.names=TRUE)
Sometimes you need to store the data as data frame - this is demanded, e.g., if your data will be passed to a particular
R function that accepts only data frames as input or if you will add new columns with different data types. You can do it with
Xd = as.data.frame(X) save(Xd, file = "mymatrixDF.RData")
If you prefer to save the data as
csv and non-scientific (i.e., classic) format, you need first to load library
and then use
write.matrix(format(X, scientific=FALSE), file = "mymatrix.csv", sep=",")
For more details about how to formulate or change the parameters of
save, see the built-in help by calling, e.g.,
Reading the data
You can read the data using several commands. For example, to read the data stored in the previous section use the following:
- reading the
XX=read.table(file="mymatrix.txt", header = TRUE)
- reading the
- reading the
read.csv(file = "mymatrix.csv")
If the data format is not standard (e.g., has some specifics related to the deliminators, encodings, header line, missing values, classes for the variables, etc.) you can cope with these specifics by setting appropriate values for the different parameters in the
read.table(). See, e.g., the proposed manuals at the end of this article.
If you run into an error like this:
Error: unexpected symbol in: "XX=read.table(file="mymatrix.txt”,header=TRUE)
then RStudio probably doesn’t understand every symbol in the code correctly. This issue can arise when you copy/paste the examples from the articles. You can resolve it by entering the symbols from the keyboard instead of copy/paste them.
© PRACE and University of Ljubljana