Basic matrix operations in R

Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

Remark

Note that we assume that you have:

  • started RStudio and
  • you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Data

Lets us consider again the following data from Article 3.3. We will work with the following matrix:

library(plyr) 
set.seed(1000)
M1 <- matrix(rnorm(150,0,1), ncol=3)              
colnames(M1)<-c("X1","X2","X3")

Matrix operations

Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by

t(M1)%*%M1

Covariance matrix

The covariance matrix of M1 we compute directly as

cov(M1)

SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by

n=nrow(M1)         #number of rows in M1
SS=(n-1)*cov(M1)

SS
          X1         X2         X3
X1 42.485852 -6.7437071 -7.3797835
X2 -6.743707 54.7612372 -0.8058014
X3 -7.379783 -0.8058014 40.5334042

If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)

then the SS matrix can also be computed as

SS1=t(M1s)%*%M1s

But later we will use the fact that this covariance matrix can also be computed as

SS2 = t(M1)%*%M1-n*outer(centr,centr)

where (recall):

centr=colMeans(M1)  

This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by

D=diag(1/sqrt(diag(SS2)))
R=D%*%SS2%*%D
R1 = cor(M1,method=c("pearson"))

We can see that R1 is equal to R.

Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by

ev = eigen(SS)
ev$values
[1] 58.08193 46.86827 32.83029
ev$vectors
            [,1]       [,2]      [,3]
[1,]  0.4511108 -0.5672369 0.6890147
[2,] -0.8798904 -0.4118343 0.2370348
[3,] -0.1493050  0.7131864 0.6848892

Share this article:

This article is from the free online course:

Managing Big Data with R and Hadoop

Partnership for Advanced Computing in Europe (PRACE)