Learn more about this course.

Basic data operations in R

In this article we present basic statistical operations over data matrices, like computing frequencies, means and variances.

Reading and storing the data are always inevitable steps in data analysis.

Introduction

In this article we provide a presentation that describes how to obtain a fast overview of the (normal size) data. Note that all these examples assume that you have

started RStudio (by executing rstudio & in the terminal) and
you have opened a new R script file.

If you have not, then with ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.
Suppose we have a data frame containing 3 numerical ratio variables and 1 categorical variable

Want to keep
learning?

This content is taken from
Partnership for Advanced Computing in Europe (PRACE) online course,

Managing Big Data with R and Hadoop

View Course

library(plyr) 
set.seed(1000)

M1 <- matrix(rnorm(150,0,1), ncol=3) 
colnames(M1)<-c("X1","X2","X3")

M2 <- matrix(round(runif(50,1,4),0),ncol=1) 
group=c('Group_A','Group_B','Group_D','Group_E')
M3 <- mapvalues(M2, from = 1:4, to = group)
colnames(M3)<-c("group")

M<-data.frame(M1,M3) 

Descriptive statistics

If you want to see the distribution (frequencies) of the different category values for the variable group (we address it as M$group) you should use table or summary.

table(M$group)
summary(M$group)

In both cases we obtain

Group_A Group_B Group_D Group_E 
 10 14 20 6 

We might also be interested in the mean values of the first three columns in M (the centroid) and the group centroids for these columns, where the groups are defined by group. Here is the code and the results.

centr=colMeans(M[,1:3]) # CENTROID
centr

 X1 X2 X3 
-0.15863016 0.19138859 0.06853306 

aggregate(M[,1:3],by=list(M$group),FUN=mean) # GROUP CENTROIDS
 group X1 X2 X3
1 Group_A -0.1159164 0.187124494 0.51985248
2 Group_B -0.1770462 0.532357620 -0.16423459
3 Group_D -0.1651467 0.013811050 0.06511391
4 Group_E -0.1651271 -0.005173836 -0.12914429

Want to keep learning?

This content is taken from Partnership for Advanced Computing in Europe (PRACE) online course

Managing Big Data with R and Hadoop

View Course

See other articles from this course

This article is from the free online

Managing Big Data with R and Hadoop

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Basic data operations in R

Introduction

Want to keep
learning?

Managing Big Data with R and Hadoop

Descriptive statistics

Want to keep learning?

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Basic data operations in R

Share this step

Introduction

Want to keep learning?

Managing Big Data with R and Hadoop

Descriptive statistics

Want to keep learning?

Managing Big Data with R and Hadoop

Share this

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?