Developing metabolite identification pipelines
In the metabolite identification or annotation process parameters from our experimental data are matched to databases.
This may be the mass or the fragmentation mass spectra of a metabolite. There is a range of different databases applied and these can be categorized in to different classes of database.
One of these classes is the metabolite or metabolome database. These databases contain the list of metabolites expected to be present in a defined biological sample as well as information related to the chemical and physical properties of the metabolite and other relevant information. One example is the Human Metabolome Database that contains information for greater than 40,000 metabolites, both endogenous and exogenous metabolites which are expected to be present in humans. These data include physical and chemical properties, links to diseases and concentrations expected in different human samples. This database also provides links to other databases to allow viewers to further explore the metabolite. The Human Metabolome Database is only related to one organism and so is termed a single-organism database. A second example is the KEGG database developed by Professor Kanehisa in Japan. This database lists metabolites and a limited number of metabolite properties and groups of metabolites linked together in metabolic pathways. The metabolic pathways also provide information about the proteins and genes related to these pathways. The KEGG database is an example of a multi-organism database with information for many different organisms including microbes, plants and mammals.
The complete list of expected metabolites is not known for humans and so other chemical databases are often searched when investigating a metabolite in a human sample. These may include chemical databases such as PubChem and ChemSpider which contain information on a much larger number of chemicals of which metabolites comprise a small number. Some of these chemicals will be present in human samples but some will not. These databases are highly useful but they increase the search space of chemicals and complicate the process of metabolite identification.
For example, if your identification process has narrowed down your search to two chemicals. One chemical is a common metabolite normally detected in the sample type analysed and the other is a chemical that is not necessarily expected to be present in the biological sample. Common sense and logic defines that the metabolite is most probably the common metabolite detected in many other studies but there is still a small chance that it could be the unusual chemical and further analysis of the acquired data acquired or further experiments are required to confirm or deny that it is the common metabolite.
A third type of database contains spectral information acquired from chemical standards and collated in to a single usable source of information. These could be mass spectral libraries that contain accurate mass, retention time and fragmentation mass spectral data for metabolites. These data are applied to annotate or identify metabolites in biological samples. Examples of mass spectral libraries applied for metabolite annotation in LC-MS datasets include mzCloud, METLIN and MassBank. Similar databases or libraries are available for other analytical instruments including gas chromatography-mass spectrometry and nuclear magnetic resonance spectroscopy. One important point to remember is that these mass spectral libraries require chemical standards to be constructed. However, a chemical standard is not available for all metabolites and therefore these libraries currently do not contain data for all possible metabolites, in fact there are more potential metabolites missing than present. You can not match to a metabolite not present in a database!
For any single analytical platform the databases described above can be freely available for all scientists to use or may only be available through purchasing the database from an instrument or software company. Some research laboratories have their own databases or libraries which they have constructed and apply to their own research projects.
© University of Birmingham and Birmingham Metabolomics Training Centre.