Learn more about this course.

Basic matrix operations in R

In this article we describe how to perform basic matrix operations using `R` with the focus to the candidates for parallelization with map-reduce.

Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

Remark

Note that we assume that you have:

Want to keep
learning?

This content is taken from
Partnership for Advanced Computing in Europe (PRACE) online course,

Managing Big Data with R and Hadoop

View Course

started RStudio (by executing rstudio & in the terminal) and
you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Data

Lets us consider again the following data from Article 3.3.
We will work with the following matrix:

library(plyr) 
set.seed(1000)
M1 <- matrix(rnorm(150,0,1), ncol=3) 
colnames(M1)<-c("X1","X2","X3")

Matrix operations

Identity matrix

An identity matrix is a square matrix with diagonal elements equal to 1 and other elements equal to 0, e.g., we can create an identity matrix with dimensions 3×3 by:

diag(3)

or by the number of columns in M1 (which is 3):

diag(ncol(M1))

Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by:

t(M1)%*%M1

Covariance matrix

The covariance matrix of M1 we compute directly as:

cov(M1)

SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by:

n=nrow(M1) #number of rows in M1
SS=(n-1)*cov(M1)

SS
 X1 X2 X3
X1 42.485852 -6.7437071 -7.3797835
X2 -6.743707 54.7612372 -0.8058014
X3 -7.379783 -0.8058014 40.5334042

If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)

then the SS matrix can also be computed as:

SS1=t(M1s)%*%M1s

But later we will use the fact that this covariance matrix can also be computed as:

SS2 = t(M1)%*%M1-n*outer(centr,centr)

where (recall):

centr=colMeans(M1)

This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by:

D=diag(1/sqrt(diag(SS2)))
R=D%*%SS2%*%D
R1 = cor(M1,method=c("pearson"))

We can see that R1 is equal to R.

Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by:

ev = eigen(SS)
ev$values
[1] 58.08193 46.86827 32.83029
ev$vectors
 [,1] [,2] [,3]
[1,] 0.4511108 -0.5672369 0.6890147
[2,] -0.8798904 -0.4118343 0.2370348
[3,] -0.1493050 0.7131864 0.6848892

Want to keep learning?

This content is taken from Partnership for Advanced Computing in Europe (PRACE) online course

Managing Big Data with R and Hadoop

View Course

See other articles from this course

This article is from the free online

Managing Big Data with R and Hadoop

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Basic matrix operations in R

Introduction

Remark

Want to keep
learning?

Managing Big Data with R and Hadoop

Data

Matrix operations

Identity matrix

Matrix multiplication

Covariance matrix

SS matrix

Correlation matrix

Eigenvalue decomposition

Want to keep learning?

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Basic matrix operations in R

Introduction

Remark

Want to keep learning?

Managing Big Data with R and Hadoop

Data

Matrix operations

Identity matrix

Matrix multiplication

Covariance matrix

SS matrix

Correlation matrix

Eigenvalue decomposition

Want to keep learning?

Managing Big Data with R and Hadoop

Share this

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?