Skip main navigation

Basic matrix operations in R

In this article we describe how to perform basic matrix operations using `R` with the focus to the candidates for parallelization with map-reduce.
© PRACE and University of Ljubljana

Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

Remark

Note that we assume that you have:

  • started RStudio (by executing rstudio & in the terminal) and
  • you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

Data

Lets us consider again the following data from Article 3.3.
We will work with the following matrix:

library(plyr) 
set.seed(1000)
M1 <- matrix(rnorm(150,0,1), ncol=3)
colnames(M1)<-c("X1","X2","X3")

Matrix operations

Identity matrix

An identity matrix is a square matrix with diagonal elements equal to 1 and other elements equal to 0, e.g., we can create an identity matrix with dimensions 3×3 by:

diag(3)

or by the number of columns in M1 (which is 3):

diag(ncol(M1))

Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by:

t(M1)%*%M1

Covariance matrix

The covariance matrix of M1 we compute directly as:

cov(M1)

SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by:

n=nrow(M1) #number of rows in M1
SS=(n-1)*cov(M1)

SS
X1 X2 X3
X1 42.485852 -6.7437071 -7.3797835
X2 -6.743707 54.7612372 -0.8058014
X3 -7.379783 -0.8058014 40.5334042

If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)

then the SS matrix can also be computed as:

SS1=t(M1s)%*%M1s

But later we will use the fact that this covariance matrix can also be computed as:

SS2 = t(M1)%*%M1-n*outer(centr,centr)

where (recall):

centr=colMeans(M1) 

This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by:

D=diag(1/sqrt(diag(SS2)))
R=D%*%SS2%*%D
R1 = cor(M1,method=c("pearson"))

We can see that R1 is equal to R.

Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by:

ev = eigen(SS)
ev$values
[1] 58.08193 46.86827 32.83029
ev$vectors
[,1] [,2] [,3]
[1,] 0.4511108 -0.5672369 0.6890147
[2,] -0.8798904 -0.4118343 0.2370348
[3,] -0.1493050 0.7131864 0.6848892
© PRACE and University of Ljubljana
This article is from the free online

Managing Big Data with R and Hadoop

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education