Skip main navigation

Generating public health reports

Article discussing how to generate informative reports

Genomic epidemiology can be used to assist Infection Prevention Control (IPC) teams by linking cases together that may otherwise be missed, or by providing increased resolution to break down outbreaks into distinct subgroups (Figure 1). Such subgroups may more easily point towards sources of infection spread, such as; common treatment regimes, interactions with individuals (e.g. healthcare workers or visitors), or shared locations within the ward.

Illustration representing viral infection in a hospital. Viral variants are presented by purple, red and green colours. In ward X, there are infection cases with purple and green variants, while in wards Y and Z, there are cases of red only and all three variants, respectively. WGS was used to confirm the presence of the variants in each ward.

Click to enlarge

Figure 1 – Infection of three different variants of SARS-CoV-2 amongst three wards. Whole genome sequencing (WGS) can identify distinct variants circulating in each ward, as well as identify closely linked cases that may be part of shared infection chains. Created with

Tools for reporting

A number of freely available tools have been used by members of COG-UK in order to analyse genomic data of SARS-CoV-2 in the context of epidemiological assessment. One simple approach is to plot phylogenetic trees to highlight the genetic distance (e.g. the number of nucleotides that differ) between different samples. Cases of the same variant will cluster on branches together, while other variants will sit on distinct branches in the tree (Figure 2).

Illustration representing an hypothetical phylogenetic tree showing the separate clusters of Alpha (pink), Delta (green) and Omicron (blue) SARS-CoV-2 lineages.

Click to enlarge

Figure 2 – Example phylogenetic tree showing similarities and differences between sequenced cases of SARS-CoV-2. Cases with the same lineage, and thus fewer differences in genomic sequence, cluster together.

In addition, tools such as A2BCovid and OutBreaker2 and TransCluster can be used to identify closely linked cases by factoring in the date of sampling, the mutation rate of the virus, and the genomic similarity between two cases.

The first two tools also allow users to incorporate epidemiological data, such as ward locations of patients over their stay, to allow the most likely disease outbreaks to be reconstructed. Similarly, the Cluster Investigation and Virus Epidemiology Tool (CIVET) was developed by COG-UK to allow researchers to visualise new sequences alongside previous cases from across the UK.

Such tools rapidly provide essential epidemiological context to new cases and can thus help to better understand outbreaks. However, they require in-depth computational experience and bioinformatics expertise to use, which must be well documented in reports to stakeholders.

Report writing

To ensure that complex genomic data are of maximum benefit to IPC teams, succinct reporting must be carried out in order to convey the key information quickly and easily to stakeholders who may be non-experts.

Reports must be generated rapidly to be of most benefit, and provide key information of benefit to IPC teams. This may include information on variants currently circulating or known to have been circulating previously, novel introductions of previously unseen variants, identification of previously identified VOCs, and clustering of cases to highlight potential linked cases (see below).

Consistent and regular reporting is important to ensure that IPC teams can respond rapidly to the information. The use of reporting templates, or even programmatically generated reports using tools such as R Markdown, can help to ensure consistency and are easily updated with additional analyses as requirements develop. An example cluster plot from a weekly report from the University of Portsmouth sequencing team is shown in Figure 3.

Example image of R-markdown output. On the top of the image is written: “Cohor transmission clustering. To access whether these samples represent a true transmission cluster, the transcluster package in R was used to infer transmission between samples. This takes as an input the number of SNP differences between the likelihood that they are separated from one another by a set number of transmission events (T): Transmission based clusters for T = 1 using region. The image shows a cloud of clustered circles presenting samples. The circles are coloured according to SARS-CoV-2 lineage: mustard = B.	1, green = B.1.1.7 and blue = B.1.620.

Click to enlarge

Figure 3 – Example of R-markdown reporting of clusters identified using TransCluster. Potentially linked cases identified as clusters were further reported in a table to allow easy tracing by hospital staff.

© COG-Train
This article is from the free online

A Practical Guide for SARS-CoV-2 Whole Genome Sequencing

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now