WARWICK DUNN: Throughout the course we have discussed that metabolomes are highly complex and contain hundreds or thousands of metabolites. To convert data to biological knowledge we have to annotate or chemically identify metabolites. We can apply data we collected during the study to assist in this process. So what do I mean by annotate and chemically identify? Well, there are different levels of confidence you can apply to this process. One process is to collect mass-to-charge ratio, retention time, and fragmentation mass spectra, and compare these data to the equivalent data collected for a pure chemical standard that you can purchase from a chemical supplier.
If the data for the metabolite and chemical standard match, then you have a relatively high confidence that you have provided a unique identification. However, if you cannot match your data to an authentic chemical standard– for example, if the chemical standard cannot be purchased– then you have less confidence in the process, and we define this as annotation and not identification. In the metabolomics community we apply four levels of confidence to identify metabolites. These range from high confidence, level one, to not identified, level four. The concern of not matching the metabolite to an authentic chemical standard is important, as many metabolites have similar chemical structures, the same molecular formula, and the same mass-to-charge ratio.
For example, the two amino acids leucine and isoleucine have the same mass-to-charge ratio but will produce two separate chromatographic peaks in a well designed liquid chromatography mass spectrometry method. So how do you know that peak one is leucine, and peak two is isoleucine without analysing the chemical standard? Simply, you cannot. This is an example of the difficulty in applying mass spectrometry. Mass spectrometers measure mass, and there are many examples where many metabolites have the same mass. We call these isomers. Isomers can sometimes have very similar chemical structures, and therefore they can have very similar, or identical, retention times.
And when coupled with having the same mass-to-charge ratio, this can mean that you cannot uniquely name a chromatographic peak as a single metabolite without further experiments, of which analysing the authentic chemical standard is only one of these experiments. There are many examples of isomers in biology. The amino acids leucine and isoleucine, as discussed before, and the sugars glucose and fructose have the same molecular formula and therefore the same mass and similar, but not identical, chemical structures. Another example are unsaturated fatty acids. These fatty acids can have one or more carbon-carbon double bonds, but these carbon double bonds are located at different positions in the fatty acid molecule to create different fatty acids.
Being able to assign a single metabolite name in untargeted metabolomics is not always a simple process and can include errors or reporting of multiple metabolite names for a peak. However, we must remember that each metabolite has a different metabolic and biological function, and that identifying it uniquely is an important aspect. Another challenge is the number of metabolites that can be detected or are expected to be present in different organisms. Microbes like yeast and E. coli contain up to 2,000 metabolites as defined by systems biology approaches. Plants are estimated to contain many more metabolites. Up to 200,000 different metabolites in the plant kingdom have been estimated.
In humans, the largest list of metabolites currently available is the Human Metabolome Database, developed by David Wishart’s group in Canada. This defines 40,000 metabolites. In humans, the diversity of metabolites is complex. Metabolites can be those synthesised in the body. Metabolites are food, drink, and the air we breathe, that we consume, as well as metabolites resulting from metabolism of these chemicals. Finally, we have metabolites produced by microbes present on our skins and in our intestine. So in humans, these different sources of metabolites provide a significant complexity of source and chemical diversity. This leads on to another challenge– knowing which metabolites we expect to be present in the metabolome.
It is important to realise that we do not have the complete list of metabolites that could potentially be detected in each sample, what we call the parts list. This is an important concept in plants and humans. If we had the name of every single metabolite that may be present in a sample, then this simplifies the process and improves our confidence in the identification reported. However, we do not have these completed parts lists. For example, in humans, we take different drugs regularly, either over-the-counter drugs like paracetamol or drugs prescribed by our doctors. These drugs are metabolised in our bodies to other metabolites.
Currently, there is not a list or database of these drugs and their metabolites which is publicly available to use. The same is true of food components– though FoodDB is one example of how this is being improved– and is also true for chemicals we absorb from the environment, including pollutants. Without knowing what we expect to detect, we have to consider that there may be many metabolites present that we expect to be present and have been detected and identified before. But also that there could be metabolites present that have not been detected or listed in a database before.
Therefore, if we assume that anything could be present, then this increases our search space, the number of metabolites or chemicals we believe may be present in the metabolome that we detect.