A primer on genome sequencing
The genome of an organism consists of one or more stretches of DNA, which can be thought of as strings of the letters A, T, G and C. A particular string of DNA letters is called a sequence. DNA letters are also known as bases. Because DNA is a double helix, consisting of pairs of bases, the length of a DNA sequence is described as a number of base pairs, or bp for short. The process of determining these sequences is called genome sequencing. The perfect genome sequencing technology would enable us to identify the order of letters in a chromosome all in one go. However, this is not currently possible.
Diagram of DNA Double Helix and Sequence Courtesy: National Human Genome Research Institute
Instead we are only able to determine shorter sequences, called reads. The length and number of these reads varies with different technologies. The first technology used to sequence bacterial genomes was Sanger sequencing. Watch this video to see how the technology works. It produces 500bp reads, but is expensive and laborious. Illumina’s sequencing-by-synthesis approach can produce reads of 75-250bp much more cheaply. Watch this video explaining the technology. This technology also produces a large number of reads, perhaps 500 million at a time. The Pacific Biosciences Single-Molecule Real Time (SMRT) technology produces reads of 5000bp-40000bp. The number of reads is smaller - around 1 million. Different technologies are useful for different applications. In particular SMRT sequencing is most useful for producing new genome reference sequences from scratch, whereas sequencing-by-synthesis is most useful for looking at variation between genomes (resequencing).
To get a good understanding of what a whole genome looks like we need to start with many copies of the bacterial genome (i.e. lots of bacteria). If we only had one genome, there would be many parts of the genome that would not be captured, because some pieces are lost by chance during preparation for sequencing. There are also usually a small number of errors in each read. Therefore, to be sure of producing an accurate final sequence, we need to sample each part of the genome several times.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences