## Want to keep learning?

This content is taken from the Partnership for Advanced Computing in Europe (PRACE)'s online course, Managing Big Data with R and Hadoop. Join the course to learn more.
3.3

## Partnership for Advanced Computing in Europe (PRACE) Reading and storing the data are always inevitable steps in data analysis.

# Basic data operations in R

## Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have

• started RStudio (by executing rstudio & in the terminal) and
• you have opened a new R script file.

If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter. Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable

library(plyr)
set.seed(1000)

M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")

M2 <- matrix(round(runif(50,1,4),0),ncol=1)
group=c('Group_A','Group_B','Group_D','Group_E')
M3 <- mapvalues(M2, from = 1:4, to = group)
colnames(M3)<-c("group")

M<-data.frame(M1,M3)


## Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary. table(M$group)
summary(M$group)  In both cases we obtain Group_A Group_B Group_D Group_E 10 14 20 6  We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results. centr=colMeans(M[,1:3]) # CENTROID centr X1 X2 X3 -0.15863016 0.19138859 0.06853306 aggregate(M[,1:3],by=list(M$group),FUN=mean)   # GROUP CENTROIDS
group         X1           X2          X3
1 Group_A -0.1159164  0.187124494  0.51985248
2 Group_B -0.1770462  0.532357620 -0.16423459
3 Group_D -0.1651467  0.013811050  0.06511391
4 Group_E -0.1651271 -0.005173836 -0.12914429