## Want to keep learning?

This content is taken from the Partnership for Advanced Computing in Europe (PRACE)'s online course, Managing Big Data with R and Hadoop. Join the course to learn more.
3.4

# Basic matrix operations in R

## Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

## Remark

Note that we assume that you have:

• started RStudio (by executing rstudio & in the terminal) and
• you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

## Data

Lets us consider again the following data from Article 3.3. We will work with the following matrix:

library(plyr)
set.seed(1000)
M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")


## Matrix operations

### Identity matrix

An identity matrix is a square matrix with diagonal elements equal to 1 and other elements equal to 0, e.g., we can create an identity matrix with dimensions 3x3 by:

diag(3)


or by the number of columns in M1 (which is 3):

diag(ncol(M1))


### Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by:

t(M1)%*%M1


### Covariance matrix

The covariance matrix of M1 we compute directly as:

cov(M1)


### SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by:

n=nrow(M1)         #number of rows in M1
SS=(n-1)*cov(M1)

SS
X1         X2         X3
X1 42.485852 -6.7437071 -7.3797835
X2 -6.743707 54.7612372 -0.8058014
X3 -7.379783 -0.8058014 40.5334042


If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)


then the SS matrix can also be computed as:

SS1=t(M1s)%*%M1s


But later we will use the fact that this covariance matrix can also be computed as:

SS2 = t(M1)%*%M1-n*outer(centr,centr)


where (recall):

centr=colMeans(M1)


This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

### Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by:

D=diag(1/sqrt(diag(SS2)))
R=D%*%SS2%*%D
R1 = cor(M1,method=c("pearson"))


We can see that R1 is equal to R.

### Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by:

ev = eigen(SS)
ev$values  58.08193 46.86827 32.83029 ev$vectors
[,1]       [,2]      [,3]
[1,]  0.4511108 -0.5672369 0.6890147
[2,] -0.8798904 -0.4118343 0.2370348
[3,] -0.1493050  0.7131864 0.6848892