Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

Find out more

Standardisation, data repositories and open access

An overview the development of standards and data repositories and open access science in metabolomics.
MARK VIANT: Omics experiments generate avalanches of data, and these data can take many months or even years to acquire at a significant cost. To maximise the use and reuse of these valuable datasets requires storage of the data in specialised repositories. It’s also important to store all of the information describing how the experiment was conducted, and this is called metadata. The storage of the data and the metadata must be done in a standardised way. In metabolomics, the era of open access is upon us. Labs around the world are increasingly required to ensure that all scientific research data are made available to the scientific community once the findings of the study have been published.
In addition, funding organisations are requesting that all data is made open access, and a number of scientific journals are also requesting the same before scientific papers are published. We describe these data as being open access. There are currently two large data repositories used in metabolomics. MetaboLights is the first, which is housed at the European Bioinformatics Institute near Cambridge in the UK and supported by BBSRC funding. The second is Metabolomics Workbench, which is sponsored by the Common Fund of the National Institutes of Health, NIH, in the US, and is housed in San Diego. These repositories provide a cross-species, cross-technique database for metabolomics experiments and their metadata.
These data repositories provide easy access for the reuse of data. For example, data from the MetaboLights data repository, which was initially acquired to answer biological questions, has since been used to validate computational tools for data analysis. And this is only one type of reuse. Other examples include the integration of multiple datasets studying the same biological question. Now, this will become increasingly possible in the future when sufficient numbers of datasets are available. And we will call this type of analysis meta-analysis.
The driving force in the development of data standards arose from the needs of scientific journals to provide formats and guidelines to publish and store these big datasets. This has recently gained momentum with the realisation that for cross comparison of datasets, the data must be stored in the same format, and the metadata should capture all the information relating to the key aspects of the experimental pipeline. As we increasingly move towards open access science, the application of so-called minimal reporting standards will ensure that these large datasets are preserved as a valuable resource for the future.
The aim of defining minimal reporting standards is not to prescribe how an experiment should be performed, but rather to provide sufficient information to interpret the experimental findings. And to permit the reuse of the data beyond the scope of the original study, to enable comparisons with similar experiments, or even to allow the experiment to be repeated exactly. Relevant information should be captured at each step in the metabolomics pipeline, including that related to the biological samples, the study design, the technology used to measure the metabolites, and the data analysis methods.
It is necessary to achieve a balance when developing these guidelines to, on one hand, ensure the future use of the data is preserved, but on the other hand, to minimise the administrative burden on the scientists of actually annotating the data and the metadata in the standardised format. In the field of metabolomics, the documents describing minimal reporting standards for a metabolomics study were developed and published in 2007 by the Metabolomics Standards Initiative. This international initiative was coordinated by the Metabolomics Society, which is the scientific organisation dedicated to promoting the growth, use, and understanding of metabolomics in the life sciences.
Developing these documents required a major coordinated effort by the metabolomics community, due to the diverse range of techniques and data analysis approaches that are routinely applied. To encompass the breadth of the information in this complex approach, a number of working groups and, later, task groups were formed to define the information that needs to be captured at each step in the metabolomics pipeline. This tremendous effort by many dedicated scientists will provide a valuable tool for today and for the future. The ultimate success of any minimal reporting standards initiative is the global adoption of those standards by the scientific community, such that all data is published and stored using those defined guidelines.
Working with instrument manufacturers, software companies, and database experts to develop and streamline the process of data and metadata storage, as well as to provide training in these guidelines, are essential for maximising the uptake of these procedures by the metabolomics community.
This article is from the free online

Metabolomics: Understanding Metabolism in the 21st Century

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now