Skip main navigation

Best practices for data sharing

Article of best practices on data sharing
Decorative illustration with
© COG-Train

Today you can sequence the SARS-CoV-2 genome within hours once a case of COVID-19 positive is identified. This has enabled the creation of molecular diagnostic assays, improved global preparedness, and countermeasures, the development of vaccines and the prediction of newly emerging variants during the pandemic. Virus genome sequencing and predicting lineages also contribute to understanding the dynamics of viral epidemics and evaluating the efficacy of control measures.

With the rapid sharing of the genome sequence data, accurate anonymized epidemiological and clinical metadata will have a positive impact on public health and maximise the impact of genomic sequencing in the public health response. There are several factors that are to be considered while sharing, analysing and publishing the sequence data. Below are some of the best practices put together by the SARS-CoV-2 community that can be followed at every laboratory:

  • Acknowledgement of those who are involved in collecting clinical samples and generating viral genome sequences.
  • When publicly available data is used, the data source, publications and pre-print articles have to be cited where available.
  • Funders, Journal editors and Peer-reviewers should encourage sustained data-sharing.
  • Sharing the anonymised sample metadata with the genomic sequence will lead to the optimal utility of the SARS-CoV-2 genomic sequence.
  • The date and place of sample collection should always be included in shared metadata, but extra metadata will considerably expand the sequence’s potential applications.
  • Data about the sample type, how the sequence was obtained, links to other sequenced viruses, patient travel history, and demographic or clinical information should all be included in metadata where available.
  • When any information is shared, it is important that patient anonymity is protected.

WHO guidelines provide recommendations on how to format the data before sharing. See some examples for sample-specific metadata format in the table below:

Metadata type Recommended format if applicable
Date of sample collection YYYY-MM-DD
Location Continent/country/region/city
Host For example, human or mouse
Patient age For humans, give an age in years (e.g. 65) or age with the unit if under 1 year (e.g. 1 month, 7 weeks)
Sex Male, female or unknown
Additional host information No standard format – for animals, this may include context, such as “domestic – farm”, “domestic – household”, “wild”, etc.
Travel history No standard format – travel history in the 14 days preceding symptom onset should be obtained from patients where possible
Cluster or isolate name No standard format
Date of symptom onset YYYY-MM-DD
Symptoms No standard format
Clinical outcome if known No standard format
Specimen source, sample type No standard format – examples: “sputum”, “blood”, “serum”, “saliva”, “stool”, “nasopharyngeal swab”

Additional Information regarding the sequencing data:

Sequencing technology No standard format – ideally, this should include the laboratory approach and sequencing platform (e.g. “Metagenomics on Illumina HiSeq 2500” or “ARTIC PCR primer scheme on ONT MinION”)
Assembly method, consensus generation method No standard format
Minimum sequencing depth required to call sites during consensus sequence generation e.g. 20x

Further information: Seminar: What are the gaps in knowledge and skills among researchers in LMICs to share and use COVID-19 data

What do you think about the seven principles of best practice listed above? Are any of them more important than others? Do your views depend on the context in which the samples are being collected and used? Please use the discussion area to share your views.

© COG-Train
This article is from the free online

Making sense of genomic data: COVID-19 web-based bioinformatics

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education