We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

Basic data operations in R

In this article we present basic statistical operations over data matrices, like computing frequencies, means and variances.
Reading and storing the data are always inevitable steps in data analysis.
© PRACE and University of Ljubljana

Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have
  • started RStudio (by executing rstudio & in the terminal) and
  • you have opened a new R script file.
If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.
Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable
library(plyr) 
set.seed(1000)

M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")

M2 <- matrix(round(runif(50,1,4),0),ncol=1)
group=c('Group_A','Group_B','Group_D','Group_E')
M3 <- mapvalues(M2, from = 1:4, to = group)
colnames(M3)<-c("group")

M<-data.frame(M1,M3)

Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary.
table(M$group)
summary(M$group)
In both cases we obtain
Group_A Group_B Group_D Group_E 
10 14 20 6
We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results.
centr=colMeans(M[,1:3]) # CENTROID
centr

X1 X2 X3
-0.15863016 0.19138859 0.06853306

aggregate(M[,1:3],by=list(M$group),FUN=mean) # GROUP CENTROIDS
group X1 X2 X3
1 Group_A -0.1159164 0.187124494 0.51985248
2 Group_B -0.1770462 0.532357620 -0.16423459
3 Group_D -0.1651467 0.013811050 0.06511391
4 Group_E -0.1651271 -0.005173836 -0.12914429
© PRACE and University of Ljubljana
This article is from the free online

Managing Big Data with R and Hadoop

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education