Skip main navigation

Basic matrix operations in R

In this article we describe how to perform basic matrix operations using `R` with the focus to the candidates for parallelization with map-reduce.

Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

Remark

Note that we assume that you have:

  • started RStudio (by executing rstudio & in the terminal) and
  • you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Data

Lets us consider again the following data from Article 3.3.
We will work with the following matrix:

library(plyr) 
set.seed(1000)
M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")

Matrix operations

Identity matrix

An identity matrix is a square matrix with diagonal elements equal to 1 and other elements equal to 0, e.g., we can create an identity matrix with dimensions 3×3 by:

diag(3)

or by the number of columns in M1 (which is 3):

diag(ncol(M1))

Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by:

t(M1)%*%M1

Covariance matrix

The covariance matrix of M1 we compute directly as:

cov(M1)

SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by:

n=nrow(M1) #number of rows in M1
SS=(n-1)*cov(M1)

SS
X1 X2 X3
X1 42.485852 -6.7437071 -7.3797835
X2 -6.743707 54.7612372 -0.8058014
X3 -7.379783 -0.8058014 40.5334042

If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)

then the SS matrix can also be computed as:

SS1=t(M1s)%*%M1s

But later we will use the fact that this covariance matrix can also be computed as:

SS2 = t(M1)%*%M1-n*outer(centr,centr)

where (recall):

centr=colMeans(M1) 

This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by:

D=diag(1/sqrt(diag(SS2)))
R=D%*%SS2%*%D
R1 = cor(M1,method=c("pearson"))

We can see that R1 is equal to R.

Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by:

ev = eigen(SS)
ev$values
[1] 58.08193 46.86827 32.83029
ev$vectors
[,1] [,2] [,3]
[1,] 0.4511108 -0.5672369 0.6890147
[2,] -0.8798904 -0.4118343 0.2370348
[3,] -0.1493050 0.7131864 0.6848892
© PRACE and University of Ljubljana
This article is from the free online

Managing Big Data with R and Hadoop

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now