Skip main navigation

What is viral whole-genome sequencing

Article explaining how to apply viral whole-genome sequencing

Targeted approaches for viral WGS and comparison with metagenomics

Although billions of copies of a pathogen genome may be found in a sample, as viral genomes are substantially smaller than human genomes the proportion of host nucleic acid in an extract massively outweighs the proportion of viral nucleic acid.

To overcome this, and if the virus of interest is known a priori, targeted sequencing approaches can be used. In targeted approaches, the viral DNA or RNA is selectively amplified or enriched prior to sequencing and the sequence data generated will be specific, with as much as 99% of the sequence generated corresponding to the organism of interest. This has the advantage of reducing the sequencing costs whilst at the same time giving much more detailed information at a high resolution which allows the identification of genotypes, resistance mutations and transmission data.

There are two main targeted approaches used for viral WGS – amplicon sequencing and bait or probe capture.

Amplicon sequencing

This is a PCR approach in which the entire pathogen genome is amplified on overlapping PCR fragments before library preparation. This approach can be quicker and cheaper than bait/probe capture and is ideal for sequencing small viral genomes, such as the ~11kb Zika virus and ~30kb SARS-CoV-2 virus. Tools such as primal scheme are used to design primers using the target viral genome sequence (or sequences) as the reference. Depending on the size of the viral genome, multiple multiplex (i.e. more than one primer pair in the reaction) PCR reactions will be used to amplify the genome, which is then pooled for sequencing. For viruses with RNA genomes, the RNA is converted to single-stranded cDNA before the PCR step.

Schematics of pooled libraries. Detailed description in the figure legend

Click to enlarge

Figure 1 – ​​(a) Schematic showing the regions amplified in pools 1 (upper track) and 2 (lower track), and the intended overlap between pools (as determined in Step 1). (b) Products generated by PCR in Step 9 from pools 1 (left tube) and 2 (right tube) for the hypothetical scheme shown in a. (c) In Step 12A(ii), the input amount is normalised based on the number of samples and the scheme length; pool 1 and 2 products can be pooled at this stage (shown) or kept separate if you wish to barcode them individually. In Step 12A(iv), products for each sample are then barcoded by ligation of a unique barcode. In Step 12A(vi), all barcoded products are pooled together before sequencing adaptor ligation, yielding a sequenceable library. Source Nature Protocols

Bait or probe capture

In the bait/probe capture approach, the sample is enriched for viral nucleic acid during library preparation. After fragmentation and often the first stage of library preparation, the viral nucleic acid is hybridised into small, ~100-120b DNA or RNA fragments (i.e. the probes/baits) which are complementary to the viral genome sequence. The pathogen nucleic acid is then captured on a solid phase, such as streptavidin-coated beads as the probes are labelled with biotin, which is washed to deplete the human or other non-target DNA in the sample. After a second round of PCR to amplify the targeted sequences, the sample is then sequenced. Although more expensive than amplicon sequencing, the bait capture approach is ideal for poor-quality samples (such as FFPE extracts) as the baits can hybridise into degraded fragments of varying lengths. Additionally, highly diverse viral genomes as the larger capture probes have an increased tolerance for sequence mismatches compared to the PCR primers used in amplicon sequencing.

Deciding the appropriate targeted approach requires consideration of factors including the quality of the nucleic acid, the viral genome diversity, how much virus is within the sample, cost, laboratory expertise, reagent and sequencing platform availability. There are advantages and disadvantages to using either of the targeted methods or non-target metagenomic sequencing, as summarised in Table 1. As an example, the usefulness of each of these approaches to generating full-length hepatitis C virus genomes has been evaluated and compared (Figure 2).

Schematic illustration of viral WCS approaches. Detailed description in the main text

Click to enlarge

Figure 2 – Overview of viral WGS approaches. All specimens originally comprise a mix of host (in blue) and pathogen (in red) DNA sequences. For pathogens that have RNA genomes, RNA in the sample is converted into complementary DNA (cDNA) before PCR and library preparation. Direct metagenomic sequencing provides an accurate representation of the sequences in the sample, although at high sequencing and data analysis and storage costs. PCR amplicon sequencing uses many discrete PCR reactions to enrich the viral genome, which increases the workload for large genomes substantially, but decreases the costs. Target enrichment sequencing uses virus-specific nucleotide probes that are bound to a solid phase, such as beads, to enrich the viral genome in a single reaction, which reduces workload, but increases the cost of library preparation compared with PCR. Source Nature Reviews Microbiology

Table 1 – Advantages and disadvantages of different viral sequencing methods. Source: Nature Reviews Microbiology

Method Advantages Disadvantages
Metagenomic sequencing Simple, cost-effective sample preparation; Can sequence novel or poorly characterised genomes; Effective in ‘fishing’ approaches to identifying a potential underlying pathogen; Lower required number of PCR cycles causes few amplification mutations; Preservation of minor variant frequencies reflects in vivo variation; No primer or probe design required, which enables a rapid response to novel pathogens or sequence variants High sequencing cost to obtain sufficient data; Relatively low sensitivity to target pathogen; Coverage is proportional to viral load; High proportion of non-pathogen reads increases computational challenges; Incidental sequencing of human and off-target pathogens raises ethical and diagnostic issues
PCR amplification sequencing Tried and trusted well-established methods and trained staff; Highly specific; most sequencing reads will be pathogen-specific, which decreases sequencing costs; Highly sensitive, with good coverage even at low pathogen load; Relatively straightforward design and application of new primers for novel sequences Labour-intensive and difficult to scale for large genomes; Iterating standard PCRs across large genomes requires high sample volume; PCR reactions are subject to primer mismatch, particularly in poorly characterised or highly diverse pathogens, or those with novel variants; Limited ability to sequence novel pathogens; High number of PCR cycles may introduce amplification mutations; Uneven amplification of different PCR amplicons may influence minor variant and haplotype reconstruction
Target enrichment sequencing Single tube sample preparation that is suited to high-throughput automation and the sequencing of large genomes; Higher specificity than metagenomics decreases sequencing costs; Overlapping probes increases tolerance for individual primer mismatches; Fewer PCR cycles (than PCR amplification) limits the introduction of amplification mutations; Preservation of minor variant frequencies reflects in vivo variation High cost and technical expertise for sample preparation; Unable to sequence novel pathogens and requires well-characterised reference genomes for probe design; Sensitivity is comparable to PCR, but coverage is proportional to pathogen load; low pathogen load yields low or incomplete coverage; Cost and time to generate new probe sets limit a rapid response to emerging and novel viruses

References

CoronaHiT: high-throughput sequencing of SARS-CoV-2 genomes

Norovirus whole genome sequencing by SureSelect target enrichment: a robust and sensitive method

Clinical and biological insights from viral genome sequencing

Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples

Comparison of Next-Generation Sequencing Technologies for Comprehensive Assessment of Full-Length Hepatitis C Viral Genomes

© COG-Train
This article is from the free online

A Practical Guide for SARS-CoV-2 Whole Genome Sequencing

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now