Skip main navigation

Direct infusion mass spectrometry data processing

An overview of the approaches applied to process data the raw data files produced in direct infusion mass spectrometry
RALF WEBER: Previously, we have discussed how direct infusion mass spectrometry and the spectral stitching approach we use here in Birmingham is applied in the study of metabolism. Let’s summarise how we acquire the data again. We infuse a liquid sample directly into the mass spectrometer, and the data is collected across a series of mass spectral windows, where each window is a small mass-to-charge range. So for the mass-to-charge ratio range for 100 to 500, we may collect data for the range 100 to 150, 140 to 190, and so on.
The use of small mass-to-charge ratio windows ensures that the data can be collected across a wider dynamic range, meaning we can measure metabolites at low and high concentrations without compromising the mass accuracy. Overall, this means we can detect a higher number of metabolites with a high mass accuracy, which improves our ability to chemically annotate the metabolites detected, than would be obtained using conventional mass spectrometry approaches. After collection of the direct infusion mass spectrometry data using our optimised approach, the data is processed using computational workflows. The final product of this workflow is a data matrix. The data matrix consists of mass-to-charge ratios and ion intensity values for each peak detected across all the samples.
This data matrix can be used for statistical analysis and metabolite annotation. So what are the important steps we apply in direct infusion mass spectrometry data processing? The first step includes mass calibration of each spectral window. Calibration is used to obtain a mass spectrum for which the mass-to-charge ratios are reported with high accuracy. There are different equations available to calibrate spectra, and they can be dependent on the mass spectrometer used. These equations require pre-defined calibration parameters or a list of calibrants with known theoretical masses. After mass calibration the multiple narrow mass-to-charge windows are stitched together into a single mass spectrum.
Accurate discrimination of real signals related to metabolites from what we call chemical and electrical noise which do not originate from metabolites is an important step during data processing. A proportion of each spectral window will include noise. By removing noise we can ensure that the signals we identify as important are biologically related and therefore worthy of validating in future studies. To do this we can apply a signal-to-noise ratio threshold. Typically, we will apply a ratio of 3 or greater. This will throw out all the peaks that have a signal-to-noise ratio lower than 3. We typically analyse each biological sample and blank sample three times. In other words, we collect three mass spectra, which are highly similar, for each biological sample.
This will allow us to identify peaks related to metabolites and remove noise features within each biological sample. Only peaks present in at least two out of three of the replicate analyses are retained. This is another step to philtre noise from biological signals. This creates a filtered peak list for each sample. To create a data matrix we align all the peak lists based on the mass-to-charge ratio. This ensures that metabolite X is defined accurately as metabolite X in each sample. This data matrix is filtered using a two-step approach. First, any signals which are detected in the blank sample cannot originate from biological material and so should be excluded from the data matrix for statistical analysis.
And secondly, peaks that are not present in a defined percentage of all the samples studied are removed. Typically, we use 80%. At this point we have a data matrix where each row represents a sample, and each column a peak defined with a mass-to-charge ratio. Prior to statistical analysis we need to address the occurrence of missing values within the matrix. Missing values can be problematic for statistical analysis. They generally occur because the metabolite is not present in all the samples, or the instrument or software fails to report a metabolite that was present. Missing values are inputed using computational algorithms, such as the K-nearest neighbour method.
Finally, the intensities of the data matrix are normalised and transformed to remove systematic bias while preserving biological information. This will be discussed in the data analysis discussion later this week.
Dr Ralf Weber provides an overview of the approaches applied to process the raw data files produced in direct infusion mass spectrometry.
This article is from the free online

Metabolomics: Understanding Metabolism in the 21st Century

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education