DNA Sequencing in the NGS era
NGS refers to a wide range of cutting-edge sequencing techniques used by various platforms
The ability to quickly analyse DNA and RNA sequences made possible by these high-throughput sequencing techniques has completely changed the way genomics is studied. One of the well-known NGS platforms is Illumina sequencing, which produces short reads using reversible dye-terminators and has gained popularity because of its precision and scalability. Roche 454 was a renowned platform for generating lengthy reads and allowing de novo genome assembly. The Proton and PGM sequencing offered by Ion Torrent used semiconductor-based technology to detect hydrogen ions generated during DNA synthesis, providing quick and affordable sequencing. Roche 454 and Ion Torrent are no longer in use.
Table 1: Sequencing Technologies
Overview of NGS Technologies
1. Illumina
Illumina sequencing is based on the use of reversible dye terminators that allow single bases to be identified as they are integrated into DNA strands. Illumina, a short read sequencer, uses the principle of Sequence By Synthesis (SBS). The process involves meticulously tracking the incorporation of labeled nucleotides as the DNA chain is replicated. To achieve this, the DNA template is held in a fixed position, allowing for controlled reactions to take place. The nucleotides A, C, G, and T are sequentially introduced and subsequently removed from the reaction. When a nucleotide successfully pairs with the first unpaired base on the DNA template, light is generated as a result of the reaction. This chemiluminescent signal is then detected and analyzed to determine the sequence of the DNA, providing valuable insights into the genetic information encoded within the DNA molecule.
Image Courtesy of Illumina, Inc.
Table 2: Illumina Comparison
2. PacBio
Pacific BioSciences (PacBio) is a long read sequencer. PacBio uses Single Molecule Real-Time (SMRT) principle. SMRT utilizes a unique approach for DNA sequencing. In this method, a DNA polymerase molecule is attached to the bottom of a nano well, and the design of Zero-Mode Waveguide (ZMW) ensures that only one nucleotide with a de-linked dye can be directly excited at any given time. DNA polymerase, situated in optical nanostructures (ZMWs), binds circular DNA ends and replicates one strand. With a long polymerase lifespan, both strands can be sequenced, forming concatenated copies separated by adapters (CLR). PacBio software identifies adapters, cuts the CLR, and aligns multicopies to create a highly accurate unique circular consensus sequence (CCS).
NB: In other systems, the fluorescent label is attached to the base in nucleotides. In SMRT technology, the fluorescent label is attached to the phosphate chain. The released labeled pentaphosphates will diffuse quickly.
3. Oxford Nanopore Technologies (ONT)
Another long read sequencer is the Oxford Nanopore Technology (ONT). ONT uses flow cells which contain an array of tiny holes – nanopores – embedded in an electro-resistant membrane. ONT works on the principle of minute changes in electric current across the nanopore immersed in a conducting fluid with voltage applied when a moving nucleotide (or DNA strand) passes through it.
The upper protein is responsible for handling single-stranded DNA (ssDNA) while the second protein serves a critical function of creating a nanopore within a membrane. Additionally, the second protein contains an adaptor molecule that regulates the speed at which DNA passes through the nanopore. As the DNA moves through the nanopore, each individual base obstructs the flow to varying degrees, leading to distinctive disruptions in the ionic current.
In Fig 2b, after library construction, a motor protein attached to the sequencing adapter recruits nucleic acid to the nanopore and unzips the dsDNA. Steady current is applied to the nanopore and sequencing relies on the changes in electrical conductivity generated when nucleic acid passes through nanopore. The sequence is determined by matching the voltage variations to the nucleotides passing through the nanopores.
Fig 2a and Fig 2b: Workflow for PacBio and ONT sequencing respectively (Eric et al 2022, attached at the end of this step).
Comparison of short read sequencers with long read sequencers
Short read sequencers (Illumina) | Long read sequencers (PacBio & ONT) |
---|---|
They offer low error rates per base and good precision | Although long reads typically have greater error rates per base than short reads, accuracy has recently increased. |
They offer low error rates per base and good precision | They offer low error rates per base and good precision. |
Short-read sequencers have a high throughput since they can produce many reads in a single run | Long-read sequencers produce longer reads but have a lower throughput than short-read sequencers |
Short-read sequencers generate DNA fragments that are typically between 50 and 300 base pairs (bp) in length | Long-read sequencers create DNA fragments with a substantially longer read length, frequently between a few thousand and tens of thousands of base pairs or longer |
Compared to long-read sequencers, short-read sequencers often offer cheaper per-base sequencing costs | Long-read technologies have greater per-base sequencing costs than short-read technologies |
De novo genome assembly with short reads might be difficult for repetitive areas and complicated genomes | Long-read sequencers are excellent for de novo genome assembly, which makes it possible to resolve complex sections and repetitive sequences |
A large range of bioinformatics tools and resources are readily available for data analysis for short reads | Because long reads have greater error rates, analysing them can be more difficult, however there are specialised software and pipelines available |
They are suitable for whole-genome sequencing, exome sequencing, RNA-seq, ChIP-seq, and amplicon sequencing, among other application | They are especially useful for metagenomics, epigenetics, structural variation detection, full-length transcriptome sequencing, and complicated genome sequencing applications |
Discussion point:
Short reads vs Long reads
There is an opinion that:
‘Illumina’s dominance of the sequencing market has meant that the vast majority of the data that has been generated so far is based on short reads. Having a large number of short reads is a good fit for a number of applications, such as detecting single-nucleotide polymor-phisms in genomic DNA and counting RNA transcripts. However, short reads alone are insufficient in a number of applications, such as reading through highly repetitive regions of the genome and deter-mining long-range structures’.
This quote is from a longer article in GEN Genetic, Engineering and Bio-technoolgy News
Questions:
What is you opinion about short vs long reads? Which type of application are you looking for? Do leave your comments in the discussion area below.
Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets
Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free