Skip to 0 minutes and 7 seconds MARK VIANT: Omics experiments generate avalanches of data, and these data can take many months or even years to acquire at a significant cost. To maximise the use and reuse of these valuable datasets requires storage of the data in specialised repositories. It’s also important to store all of the information describing how the experiment was conducted, and this is called metadata. The storage of the data and the metadata must be done in a standardised way. In metabolomics, the era of open access is upon us. Labs around the world are increasingly required to ensure that all scientific research data are made available to the scientific community once the findings of the study have been published.
Skip to 0 minutes and 52 seconds In addition, funding organisations are requesting that all data is made open access, and a number of scientific journals are also requesting the same before scientific papers are published. We describe these data as being open access. There are currently two large data repositories used in metabolomics. MetaboLights is the first, which is housed at the European Bioinformatics Institute near Cambridge in the UK and supported by BBSRC funding. The second is Metabolomics Workbench, which is sponsored by the Common Fund of the National Institutes of Health, NIH, in the US, and is housed in San Diego. These repositories provide a cross-species, cross-technique database for metabolomics experiments and their metadata.
Skip to 1 minute and 45 seconds These data repositories provide easy access for the reuse of data. For example, data from the MetaboLights data repository, which was initially acquired to answer biological questions, has since been used to validate computational tools for data analysis. And this is only one type of reuse. Other examples include the integration of multiple datasets studying the same biological question. Now, this will become increasingly possible in the future when sufficient numbers of datasets are available. And we will call this type of analysis meta-analysis.
Skip to 2 minutes and 25 seconds The driving force in the development of data standards arose from the needs of scientific journals to provide formats and guidelines to publish and store these big datasets. This has recently gained momentum with the realisation that for cross comparison of datasets, the data must be stored in the same format, and the metadata should capture all the information relating to the key aspects of the experimental pipeline. As we increasingly move towards open access science, the application of so-called minimal reporting standards will ensure that these large datasets are preserved as a valuable resource for the future.
Skip to 3 minutes and 6 seconds The aim of defining minimal reporting standards is not to prescribe how an experiment should be performed, but rather to provide sufficient information to interpret the experimental findings. And to permit the reuse of the data beyond the scope of the original study, to enable comparisons with similar experiments, or even to allow the experiment to be repeated exactly. Relevant information should be captured at each step in the metabolomics pipeline, including that related to the biological samples, the study design, the technology used to measure the metabolites, and the data analysis methods.
Skip to 3 minutes and 46 seconds It is necessary to achieve a balance when developing these guidelines to, on one hand, ensure the future use of the data is preserved, but on the other hand, to minimise the administrative burden on the scientists of actually annotating the data and the metadata in the standardised format. In the field of metabolomics, the documents describing minimal reporting standards for a metabolomics study were developed and published in 2007 by the Metabolomics Standards Initiative. This international initiative was coordinated by the Metabolomics Society, which is the scientific organisation dedicated to promoting the growth, use, and understanding of metabolomics in the life sciences.
Skip to 4 minutes and 29 seconds Developing these documents required a major coordinated effort by the metabolomics community, due to the diverse range of techniques and data analysis approaches that are routinely applied. To encompass the breadth of the information in this complex approach, a number of working groups and, later, task groups were formed to define the information that needs to be captured at each step in the metabolomics pipeline. This tremendous effort by many dedicated scientists will provide a valuable tool for today and for the future. The ultimate success of any minimal reporting standards initiative is the global adoption of those standards by the scientific community, such that all data is published and stored using those defined guidelines.
Skip to 5 minutes and 19 seconds Working with instrument manufacturers, software companies, and database experts to develop and streamline the process of data and metadata storage, as well as to provide training in these guidelines, are essential for maximising the uptake of these procedures by the metabolomics community.
Standardisation, data repositories and open access
Professor Mark Viant introduces the development of standards and data repositories within the metabolomics field and how they are important as we move towards open access science.
If you are would like to investigate the subject area further, the following publications may be of interest.
© University of Birmingham and the Birmingham Metabolomics Training Centre