# Basic matrix operations in R

In this article we describe how to perform basic matrix operations using R with the focus to the candidates for parallelization with map-reduce.

## Introduction

In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce.

## Remark

Note that we assume that you have:

• started RStudio (by executing rstudio & in the terminal) and
• you have opened a new R script file.

If you have not, then using ctrl+shift+n you start a new script file that you have to save first to a local folder. Once you type (copy) the R code into the script file, you run it by, e.g., selecting the part of the code you want to run and typing ctrl+enter.

## Data

Lets us consider again the following data from Article 3.3.
We will work with the following matrix:

library(plyr) set.seed(1000)M1 <- matrix(rnorm(150,0,1), ncol=3) colnames(M1)<-c("X1","X2","X3")

## Matrix operations

### Identity matrix

An identity matrix is a square matrix with diagonal elements equal to 1 and other elements equal to 0, e.g., we can create an identity matrix with dimensions 3×3 by:

diag(3)

or by the number of columns in M1 (which is 3):

diag(ncol(M1))

### Matrix multiplication

Suppose we want to compute the product of the transpose of M1 by M1. This can be done by:

t(M1)%*%M1

### Covariance matrix

The covariance matrix of M1 we compute directly as:

cov(M1)

### SS matrix

Likewise, we compute the sum-of-squares and coproducts matrix (SS matrix) of M1 by:

n=nrow(M1) #number of rows in M1SS=(n-1)*cov(M1)SS X1 X2 X3X1 42.485852 -6.7437071 -7.3797835X2 -6.743707 54.7612372 -0.8058014X3 -7.379783 -0.8058014 40.5334042

If we centralise the data (subtract the centroid from each row):

M1s=scale(M1,scale=FALSE)

then the SS matrix can also be computed as:

SS1=t(M1s)%*%M1s

But later we will use the fact that this covariance matrix can also be computed as:

SS2 = t(M1)%*%M1-n*outer(centr,centr)

where (recall):

centr=colMeans(M1) 

This is particularly useful for big-data computations since t(X1)%*%X1 can be computed for each data chunk separately via a map function and then summed up via a reduce step.

### Correlation matrix

Once we have an SS matrix (i.e., SS2) we can easily obtain the corresponding correlation matrix by:

D=diag(1/sqrt(diag(SS2)))R=D%*%SS2%*%DR1 = cor(M1,method=c("pearson"))

We can see that R1 is equal to R.

### Eigenvalue decomposition

Note that SS is symmetric and hence has 3 real eigenvalues and 3 corresponding eigenvectors. We can compute them by:

ev = eigen(SS)ev$values[1] 58.08193 46.86827 32.83029ev$vectors [,1] [,2] [,3][1,] 0.4511108 -0.5672369 0.6890147[2,] -0.8798904 -0.4118343 0.2370348[3,] -0.1493050 0.7131864 0.6848892
© PRACE and University of Ljubljana