Skip main navigation

Basic data operations in R

In this article we present basic statistical operations over data matrices, like computing frequencies, means and variances.
Reading and storing the data are always inevitable steps in data analysis.
© PRACE and University of Ljubljana

Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have

  • started RStudio (by executing rstudio & in the terminal) and
  • you have opened a new R script file.

If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.
Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable

library(plyr) 
set.seed(1000)

M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")

M2 <- matrix(round(runif(50,1,4),0),ncol=1)
group=c('Group_A','Group_B','Group_D','Group_E')
M3 <- mapvalues(M2, from = 1:4, to = group)
colnames(M3)<-c("group")

M<-data.frame(M1,M3)

Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary.

table(M$group)
summary(M$group)

In both cases we obtain

Group_A Group_B Group_D Group_E 
10 14 20 6

We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results.

centr=colMeans(M[,1:3]) # CENTROID
centr

X1 X2 X3
-0.15863016 0.19138859 0.06853306

aggregate(M[,1:3],by=list(M$group),FUN=mean) # GROUP CENTROIDS
group X1 X2 X3
1 Group_A -0.1159164 0.187124494 0.51985248
2 Group_B -0.1770462 0.532357620 -0.16423459
3 Group_D -0.1651467 0.013811050 0.06511391
4 Group_E -0.1651271 -0.005173836 -0.12914429
© PRACE and University of Ljubljana
This article is from the free online

Managing Big Data with R and Hadoop

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now