Skip to 0 minutes and 6 seconds RALF WEBER: Previously, we have discussed how direct infusion mass spectrometry and the spectral stitching approach we use here in Birmingham is applied in the study of metabolism. Let’s summarise how we acquire the data again. We infuse a liquid sample directly into the mass spectrometer, and the data is collected across a series of mass spectral windows, where each window is a small mass-to-charge range. So for the mass-to-charge ratio range for 100 to 500, we may collect data for the range 100 to 150, 140 to 190, and so on.
Skip to 0 minutes and 40 seconds The use of small mass-to-charge ratio windows ensures that the data can be collected across a wider dynamic range, meaning we can measure metabolites at low and high concentrations without compromising the mass accuracy. Overall, this means we can detect a higher number of metabolites with a high mass accuracy, which improves our ability to chemically annotate the metabolites detected, than would be obtained using conventional mass spectrometry approaches. After collection of the direct infusion mass spectrometry data using our optimised approach, the data is processed using computational workflows. The final product of this workflow is a data matrix. The data matrix consists of mass-to-charge ratios and ion intensity values for each peak detected across all the samples.
Skip to 1 minute and 30 seconds This data matrix can be used for statistical analysis and metabolite annotation. So what are the important steps we apply in direct infusion mass spectrometry data processing? The first step includes mass calibration of each spectral window. Calibration is used to obtain a mass spectrum for which the mass-to-charge ratios are reported with high accuracy. There are different equations available to calibrate spectra, and they can be dependent on the mass spectrometer used. These equations require pre-defined calibration parameters or a list of calibrants with known theoretical masses. After mass calibration the multiple narrow mass-to-charge windows are stitched together into a single mass spectrum.
Skip to 2 minutes and 14 seconds Accurate discrimination of real signals related to metabolites from what we call chemical and electrical noise which do not originate from metabolites is an important step during data processing. A proportion of each spectral window will include noise. By removing noise we can ensure that the signals we identify as important are biologically related and therefore worthy of validating in future studies. To do this we can apply a signal-to-noise ratio threshold. Typically, we will apply a ratio of 3 or greater. This will throw out all the peaks that have a signal-to-noise ratio lower than 3. We typically analyse each biological sample and blank sample three times. In other words, we collect three mass spectra, which are highly similar, for each biological sample.
Skip to 3 minutes and 1 second This will allow us to identify peaks related to metabolites and remove noise features within each biological sample. Only peaks present in at least two out of three of the replicate analyses are retained. This is another step to philtre noise from biological signals. This creates a filtered peak list for each sample. To create a data matrix we align all the peak lists based on the mass-to-charge ratio. This ensures that metabolite X is defined accurately as metabolite X in each sample. This data matrix is filtered using a two-step approach. First, any signals which are detected in the blank sample cannot originate from biological material and so should be excluded from the data matrix for statistical analysis.
Skip to 3 minutes and 50 seconds And secondly, peaks that are not present in a defined percentage of all the samples studied are removed. Typically, we use 80%. At this point we have a data matrix where each row represents a sample, and each column a peak defined with a mass-to-charge ratio. Prior to statistical analysis we need to address the occurrence of missing values within the matrix. Missing values can be problematic for statistical analysis. They generally occur because the metabolite is not present in all the samples, or the instrument or software fails to report a metabolite that was present. Missing values are inputed using computational algorithms, such as the K-nearest neighbour method.
Skip to 4 minutes and 32 seconds Finally, the intensities of the data matrix are normalised and transformed to remove systematic bias while preserving biological information. This will be discussed in the data analysis discussion later this week.
Direct infusion mass spectrometry data processing
Dr Ralf Weber provides an overview of the approaches applied to process the raw data files produced in direct infusion mass spectrometry.
© University of Birmingham and the Birmingham Metabolomics Training Centre