Skip main navigation

Standardisation, data repositories and open access

An overview the development of standards and data repositories and open access science in metabolomics.
MARK VIANT: Omics experiments generate avalanches of data, and these data can take many months or even years to acquire at a significant cost. To maximise the use and reuse of these valuable datasets requires storage of the data in specialised repositories. It’s also important to store all of the information describing how the experiment was conducted, and this is called metadata. The storage of the data and the metadata must be done in a standardised way. In metabolomics, the era of open access is upon us. Labs around the world are increasingly required to ensure that all scientific research data are made available to the scientific community once the findings of the study have been published.
In addition, funding organisations are requesting that all data is made open access, and a number of scientific journals are also requesting the same before scientific papers are published. We describe these data as being open access. There are currently two large data repositories used in metabolomics. MetaboLights is the first, which is housed at the European Bioinformatics Institute near Cambridge in the UK and supported by BBSRC funding. The second is Metabolomics Workbench, which is sponsored by the Common Fund of the National Institutes of Health, NIH, in the US, and is housed in San Diego. These repositories provide a cross-species, cross-technique database for metabolomics experiments and their metadata.
These data repositories provide easy access for the reuse of data. For example, data from the MetaboLights data repository, which was initially acquired to answer biological questions, has since been used to validate computational tools for data analysis. And this is only one type of reuse. Other examples include the integration of multiple datasets studying the same biological question. Now, this will become increasingly possible in the future when sufficient numbers of datasets are available. And we will call this type of analysis meta-analysis.
The driving force in the development of data standards arose from the needs of scientific journals to provide formats and guidelines to publish and store these big datasets. This has recently gained momentum with the realisation that for cross comparison of datasets, the data must be stored in the same format, and the metadata should capture all the information relating to the key aspects of the experimental pipeline. As we increasingly move towards open access science, the application of so-called minimal reporting standards will ensure that these large datasets are preserved as a valuable resource for the future.
The aim of defining minimal reporting standards is not to prescribe how an experiment should be performed, but rather to provide sufficient information to interpret the experimental findings. And to permit the reuse of the data beyond the scope of the original study, to enable comparisons with similar experiments, or even to allow the experiment to be repeated exactly. Relevant information should be captured at each step in the metabolomics pipeline, including that related to the biological samples, the study design, the technology used to measure the metabolites, and the data analysis methods.
It is necessary to achieve a balance when developing these guidelines to, on one hand, ensure the future use of the data is preserved, but on the other hand, to minimise the administrative burden on the scientists of actually annotating the data and the metadata in the standardised format. In the field of metabolomics, the documents describing minimal reporting standards for a metabolomics study were developed and published in 2007 by the Metabolomics Standards Initiative. This international initiative was coordinated by the Metabolomics Society, which is the scientific organisation dedicated to promoting the growth, use, and understanding of metabolomics in the life sciences.
Developing these documents required a major coordinated effort by the metabolomics community, due to the diverse range of techniques and data analysis approaches that are routinely applied. To encompass the breadth of the information in this complex approach, a number of working groups and, later, task groups were formed to define the information that needs to be captured at each step in the metabolomics pipeline. This tremendous effort by many dedicated scientists will provide a valuable tool for today and for the future. The ultimate success of any minimal reporting standards initiative is the global adoption of those standards by the scientific community, such that all data is published and stored using those defined guidelines.
Working with instrument manufacturers, software companies, and database experts to develop and streamline the process of data and metadata storage, as well as to provide training in these guidelines, are essential for maximising the uptake of these procedures by the metabolomics community.

Professor Mark Viant introduces the development of standards and data repositories within the metabolomics field and how they are important as we move towards open access science.

If you are would like to investigate the subject area further, the following publications may be of interest.

The metabolomics standards initiative

The role of reporting standards for metabolite annotation and identification in metabolomics studies

ELIXIR position paper on FAIR data management in life sciences

This article is from the free online

Metabolomics: Understanding Metabolism in the 21st Century

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education