Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

DNA Sequencing in the NGS era

DNA sequencing technologies Illumina, PacBio and Oxford Nanopore

NGS refers to a wide range of cutting-edge sequencing techniques used by various platforms

The ability to quickly analyse DNA and RNA sequences made possible by these high-throughput sequencing techniques has completely changed the way genomics is studied. One of the well-known NGS platforms is Illumina sequencing, which produces short reads using reversible dye-terminators and has gained popularity because of its precision and scalability. Roche 454 was a renowned platform for generating lengthy reads and allowing de novo genome assembly. The Proton and PGM sequencing offered by Ion Torrent used semiconductor-based technology to detect hydrogen ions generated during DNA synthesis, providing quick and affordable sequencing. Roche 454 and Ion Torrent are no longer in use.

table showing main sequencing technologies and their characteristics Table 1: Sequencing Technologies

Overview of NGS Technologies

1. Illumina

Illumina sequencing is based on the use of reversible dye terminators that allow single bases to be identified as they are integrated into DNA strands. Illumina, a short read sequencer, uses the principle of Sequence By Synthesis (SBS). The process involves meticulously tracking the incorporation of labeled nucleotides as the DNA chain is replicated. To achieve this, the DNA template is held in a fixed position, allowing for controlled reactions to take place. The nucleotides A, C, G, and T are sequentially introduced and subsequently removed from the reaction. When a nucleotide successfully pairs with the first unpaired base on the DNA template, light is generated as a result of the reaction. This chemiluminescent signal is then detected and analyzed to determine the sequence of the DNA, providing valuable insights into the genetic information encoded within the DNA molecule.

scheme of sequencing process beginning with DNA fragmentation and [adapter](/courses/bioinformatics-for-biologists-analysing-and-interpreting-genomics-datasets/1/steps/1813191) attachment. Fragments are then PCR-amplified and sequenced on a solid surface with labeled nucleotides. Complementary incorporation emits a signal, allowing sequence determination Image Courtesy of Illumina, Inc.

table showing different Illumina types of sequencing Table 2: Illumina Comparison

2. PacBio

Pacific BioSciences (PacBio) is a long read sequencer. PacBio uses Single Molecule Real-Time (SMRT) principle. SMRT utilizes a unique approach for DNA sequencing. In this method, a DNA polymerase molecule is attached to the bottom of a nano well, and the design of Zero-Mode Waveguide (ZMW) ensures that only one nucleotide with a de-linked dye can be directly excited at any given time. DNA polymerase, situated in optical nanostructures (ZMWs), binds circular DNA ends and replicates one strand. With a long polymerase lifespan, both strands can be sequenced, forming concatenated copies separated by adapters (CLR). PacBio software identifies adapters, cuts the CLR, and aligns multicopies to create a highly accurate unique circular consensus sequence (CCS).

NB: In other systems, the fluorescent label is attached to the base in nucleotides. In SMRT technology, the fluorescent label is attached to the phosphate chain. The released labeled pentaphosphates will diffuse quickly.

3. Oxford Nanopore Technologies (ONT)

Another long read sequencer is the Oxford Nanopore Technology (ONT). ONT uses flow cells which contain an array of tiny holes – nanopores – embedded in an electro-resistant membrane. ONT works on the principle of minute changes in electric current across the nanopore immersed in a conducting fluid with voltage applied when a moving nucleotide (or DNA strand) passes through it.

The upper protein is responsible for handling single-stranded DNA (ssDNA) while the second protein serves a critical function of creating a nanopore within a membrane. Additionally, the second protein contains an adaptor molecule that regulates the speed at which DNA passes through the nanopore. As the DNA moves through the nanopore, each individual base obstructs the flow to varying degrees, leading to distinctive disruptions in the ionic current.

In Fig 2b, after library construction, a motor protein attached to the sequencing adapter recruits nucleic acid to the nanopore and unzips the dsDNA. Steady current is applied to the nanopore and sequencing relies on the changes in electrical conductivity generated when nucleic acid passes through nanopore. The sequence is determined by matching the voltage variations to the nucleotides passing through the nanopores.

Workflow for PacBio and ONT sequencing respectively. ONT sequencing uses a motor protein to guide nucleic acids into a nanopore, where changes in electrical conductivity reveal the sequence. PacBio relies on real-time fluorescence during DNA replication, employing DNA polymerase in optical nanostructures (ZMWs) to replicate one strand. With a long polymerase lifespan, both strands can be sequenced, resulting in concatenated copies separated by adapters (CLR) (Eric et al 2022). Fig 2a and Fig 2b: Workflow for PacBio and ONT sequencing respectively (Eric et al 2022, attached at the end of this step).

Comparison of short read sequencers with long read sequencers

Short read sequencers (Illumina) Long read sequencers (PacBio & ONT)
They offer low error rates per base and good precision Although long reads typically have greater error rates per base than short reads, accuracy has recently increased.
They offer low error rates per base and good precision They offer low error rates per base and good precision.
Short-read sequencers have a high throughput since they can produce many reads in a single run Long-read sequencers produce longer reads but have a lower throughput than short-read sequencers
Short-read sequencers generate DNA fragments that are typically between 50 and 300 base pairs (bp) in length Long-read sequencers create DNA fragments with a substantially longer read length, frequently between a few thousand and tens of thousands of base pairs or longer
Compared to long-read sequencers, short-read sequencers often offer cheaper per-base sequencing costs Long-read technologies have greater per-base sequencing costs than short-read technologies
De novo genome assembly with short reads might be difficult for repetitive areas and complicated genomes Long-read sequencers are excellent for de novo genome assembly, which makes it possible to resolve complex sections and repetitive sequences
A large range of bioinformatics tools and resources are readily available for data analysis for short reads Because long reads have greater error rates, analysing them can be more difficult, however there are specialised software and pipelines available
They are suitable for whole-genome sequencing, exome sequencing, RNA-seq, ChIP-seq, and amplicon sequencing, among other application They are especially useful for metagenomics, epigenetics, structural variation detection, full-length transcriptome sequencing, and complicated genome sequencing applications
   

Discussion point:

Short reads vs Long reads

There is an opinion that:

‘Illumina’s dominance of the sequencing market has meant that the vast majority of the data that has been generated so far is based on short reads. Having a large number of short reads is a good fit for a number of applications, such as detecting single-nucleotide polymor-phisms in genomic DNA and counting RNA transcripts. However, short reads alone are insufficient in a number of applications, such as reading through highly repetitive regions of the genome and deter-mining long-range structures’.
This quote is from a longer article in GEN Genetic, Engineering and Bio-technoolgy News

Questions:

What is you opinion about short vs long reads? Which type of application are you looking for? Do leave your comments in the discussion area below.

© Wellcome Connecting Science
This article is from the free online

Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now