# Basic data operations in R

In this article we present basic statistical operations over data matrices, like computing frequencies, means and variances.

## Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have

• started RStudio (by executing rstudio & in the terminal) and
• you have opened a new R script file.

If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.
Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable

library(plyr) set.seed(1000)M1 <- matrix(rnorm(150,0,1), ncol=3) colnames(M1)<-c("X1","X2","X3")M2 <- matrix(round(runif(50,1,4),0),ncol=1) group=c('Group_A','Group_B','Group_D','Group_E')M3 <- mapvalues(M2, from = 1:4, to = group)colnames(M3)<-c("group")M<-data.frame(M1,M3) 

## Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary. table(M$group)summary(M$group) In both cases we obtain Group_A Group_B Group_D Group_E 10 14 20 6  We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results. centr=colMeans(M[,1:3]) # CENTROIDcentr X1 X2 X3 -0.15863016 0.19138859 0.06853306 aggregate(M[,1:3],by=list(M$group),FUN=mean) # GROUP CENTROIDS group X1 X2 X31 Group_A -0.1159164 0.187124494 0.519852482 Group_B -0.1770462 0.532357620 -0.164234593 Group_D -0.1651467 0.013811050 0.065113914 Group_E -0.1651271 -0.005173836 -0.12914429

© PRACE and University of Ljubljana