Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.
Reading and storing the data are always inevitable steps in data analysis.
Reading and storing the data are always inevitable steps in data analysis.

Basic data operations in R

Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have

  • started RStudio and
  • you have opened a new R script file.

If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter. Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable

library(plyr) 
set.seed(1000)

M1 <- matrix(rnorm(150,0,1), ncol=3)              
colnames(M1)<-c("X1","X2","X3")

M2 <- matrix(round(runif(50,1,4),0),ncol=1)       
group=c('Group_A','Group_B','Group_D','Group_E')
M3 <- mapvalues(M2, from = 1:4, to = group)
colnames(M3)<-c("group")

M<-data.frame(M1,M3)     

Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary.

table(M$group)
summary(M$group)

In both cases we obtain

Group_A Group_B Group_D Group_E 
     10      14      20       6 

We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results.

centr=colMeans(M[,1:3])          # CENTROID
centr

         X1          X2          X3 
-0.15863016  0.19138859  0.06853306 

aggregate(M[,1:3],by=list(M$group),FUN=mean)   # GROUP CENTROIDS
    group         X1           X2          X3
1 Group_A -0.1159164  0.187124494  0.51985248
2 Group_B -0.1770462  0.532357620 -0.16423459
3 Group_D -0.1651467  0.013811050  0.06511391
4 Group_E -0.1651271 -0.005173836 -0.12914429

Share this article:

This article is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)

Contact FutureLearn for Support