Skip main navigation

Liquid chromatography-mass spectrometry data processing

An overview of the approaches used to process the raw data files produced in liquid chromatography-mass spectrometry
WARWICK DUNN: Data acquired applying liquid chromatography-mass spectrometry instruments in an untargeted strategy are highly complex and require complex computational approaches to convert the raw data into biological knowledge. We will discuss these computational approaches during this week and here we will discuss how raw data for LC-MS instruments are processed before univariate and multivariate data analysis is performed. Let us first think about the complexity of these data. biological samples contain thousands of metabolites, and many of these are present at a high enough concentration to be detected when applying liquid chromatography mass spectrometry. Therefore liquid chromatography mass spectrometry data sets typically contain data related to hundreds or thousands of metabolites.
Electrospray ionisation is the most frequently applied ionisation source in liquid chromatography mass spectrometry. The ionisation source operates at atmospheric pressure and creates ions in the liquid phase flowing from the end of the liquid chromatography column. This ionisation source can be viewed as a chemical reactor where the types of ions formed are dependent on the composition of the liquid and sample, as well as the operating conditions of the electrospray ionisation source. Many different metabolite ions are formed by the non-covalent addition of small ions to the metabolite to form a charged metabolite and many different small ions can be present in biological samples and the liquid phase. Therefore many different types of ions can be formed simultaneously for a single metabolite.
Each different type of ion formed is called a metabolite feature, and many thousands of metabolite features are detected where several metabolite features are all related to a single metabolite. Each of these metabolite features related to a single metabolite have different mass-to-charge ratios but the same retention time. Therefore the complexity of the data, simply the mass-to-charge ratio and retention time pairs reported, is increased by the detection of multiple metabolite features for each metabolite. When analysing hundreds or thousands of samples in a single study, small sources of variation can be observed in the measured mass-to-charge ratio and chromatographic retention time.
The retention time may differ across all samples, caused by small changes to the stationary phase or changes in the mechanics of pumping liquids. The mass-to-charge ratio is measured to four decimal places, and this can vary to a small level caused by electrical or other changes in the mass spectrometer. For example, in one sample, the mass-to-charge ratio may be measured as 250.0118. And in another sample, the mass-to-charge ratio for the same metabolite is measured as 250.0120. This is the same metabolite that is measured at a slightly different mass.
So in data processing, we need to ensure that observed variation in mass-to-charge ratios and retention time are accounted for by applying ranges rather than a single mass-to-charge ratio or retention time so to align metabolites across different samples. This is essential for data analysis procedures where all metabolites need to be aligned. The normal workflow for processing liquid chromatography-mass spectrometry data involves three different processes. The first process is to perform peak picking or deconvolution on each sample separately. Viewing this from a simplistic view shows that each possible mass-to-charge ratio, or a very small range of values, is plotted and searched to see if one or more chromatographic peaks are observed for each mass-to-charge ratio or range of ratios.
If a chromatographic peak is observed, then the peak areas calculated and the mass-to-charge ratio, retention time, and peak area are reported. This provides a list of metabolite features detected in one sample. However, any metabolomics study, many samples are studied. So the next step is to align the data for each sample and combine these data into a single data set constructed with data from all the samples studied. Here, peaks are matched across all samples and the peak areas reported across all samples. In a simplistic approach, the process involves setting bins that relate to small ranges of retention time and mass-to-charge ratio.
The same metabolite in different samples is added to the same bin to allow these metabolites to be aligned across different samples. The final step is the chemical identification metabolites. This will normally involve using both to accurately measured mass-to-charge ratio and, where possible, the retention time and fragmentation mass spectra acquired applying tandem mass spectrometry. This is not a simple process, and we will discuss this later.
Professor Warwick Dunn introduces the approaches that are applied to process the raw data files produced in liquid chromatography-mass spectrometry to produce a data matrix.
This article is from the free online

Metabolomics: Understanding Metabolism in the 21st Century

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education