Skip main navigation

Data reduction and visualisation

Watch Dr Jasper Engel introduce the data reduction and visualisation approaches applied in metabolomics.
5
In a typical metabolomics experiment a wealth of information is measured for each sample resulting in a high-dimensional, wide, data table. On the one hand this is a blessing because a large amount of biological information can potentially be observed in a single experiment. However, on the other hand this can also be a curse. For example, as mentioned in the previous lecture, many traditional approaches for data visualization and statistical analysis cannot cope with such data. A common approach to resolve these issues is to reduce the dimension of the data before data visualization and statistical analysis takes place. This essentially means that we somehow want to end up with a data table with only a few columns, a small data table.
50.4
In metabolomics often a dimension reduction strategy is employed for this purpose. The aim is to capture the main properties of the data in a new table that is low-dimensional. Principal component analysis or PCA is one of the most population dimension reduction methods. The goal is to reduce the data table to only a few columns or dimensions, while retaining as much information as we can about the main differences between the samples. For illustration, imagine that my left hand is a complex data set. The data is high-dimensional, three dimensions in this case and very difficult to analyse.
90.3
However, by illuminating my hand with a flashlight, in my right hand, I can inspect the properties of my hand by studying the shadows that are projected on the wall, behind my hand. This way, I dont have to inspect a complicated 3-dimensional shape, but I can study something flat with only has twodimensions, namely the shadows in the wall. PCA uses mathematical projection to reduce high-dimensional data tables with, lets say, a thousand columns to low-dimensional data tables with only a few columns. Depending on how I position my hand in front of the light, different properties will become visible in the shadows.
133.5
For example, if I hold my hand like this you will be able to see in the shadows that I have a thumb and different fingers and you will see quite a bit about my hand. On the other hand if I rotate my hand in this direction the shadows will be quite meaningless and it will be very difficult to even observe that we are looking at a hand. This shows that we need to think carefully about how we rotate our data before projecting it to a lower dimensional table. PCA aims to show the largest differences between the samples. This corresponds to rotating the data in such a way that the shadows on the wall become the largest.
173.8
So how does this work in practice? As an example I am going to consider a study where metabolomics was used to determine patterns in metabolite profiles associated with particulate organic matter at two locations in the western English Channel, and also at two depths for each location. For this purpose, 64 water samples were analysed by liquid chromatography mass spectrometry. Subsequently, PCA was applied to reduce the initial data table with 173 columns to a table with only 6 columns. The PCA model captured 83% of the differences between the samples. To visualize these differences 2-dimensional scatter plots of pairwise combinations of the six columns in the data table were constructed.
221.9
Here, a plot of the first column (labelled as principal component, or PC one) against the fourth column (labelled as PC four) is shown. A clear separation between the two locations (L and E) could be observed along PC one. Additionally, a clear difference in depth at location E could be observed along PC four. Traditional statistical methods were subsequently applied to the small data table that is produced by PCA. Here, we used a method called multivariate analysis of variance. This way it was confirmed that the observed differences in location and depth along PC one and PC four were indeed significant. Finally, I would like to remark that the PCA model also stores information regarding how the data was rotated before dimension reduction.
273.2
Because of this, the differences in location and depth could be related back to specific peaks in the original data table. This way, a wide range of relevant metabolites, including amino-acid derivatives, oxidised fatty acids, and glycosylated compounds were associated to the observed differences. The original publication of this study discusses these findings in relation to the overall composition of organisms at the investigated locations and depths. To summarize, the high-dimensional and wide data tables that are encountered in metabolomics can be difficult to analyse, but the metabolomics community is now routinely applying many techniques to interrogate these large data sets and increase our understanding of the changes in metabolism.
319
Principal component analysis is one example and a useful dimension reduction technique to assist in the analysis of high-dimensional and wide data tables. The output of PCA is a small data table that can be visualized in scatter plots to display the biggest differences between the samples. Because the PCA model retains information about how the dimension reduction was carried out, these differences can also be related back to peaks in the original data table. In the next few steps I will look at some of the approaches that we apply after PCA to the data.

Dr Jasper Engel discusses the data reduction and visualisation techniques that are applied in the analysis of metabolomics data.

This article is from the free online

Metabolomics: Understanding Metabolism in the 21st Century

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now