Skip main navigation

A primer on genome sequencing

In this article we discuss the key concepts in genome sequencing
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
The genome of an organism consists of one or more stretches of DNA, which can be thought of as strings of the letters A, T, G and C. A particular string of DNA letters is called a sequence. DNA letters are also known as bases. Because DNA is a double helix, consisting of pairs of bases, the length of a DNA sequence is described as a number of base pairs, or bp for short. The process of determining these sequences is called genome sequencing. The perfect genome sequencing technology would enable us to identify the order of letters in a chromosome all in one go. However, this is not currently possible.
DNA Double Helix and Sequence
Diagram of DNA Double Helix and Sequence Courtesy: National Human Genome Research Institute
Instead we are only able to determine shorter sequences, called reads. The length and number of these reads varies with different technologies. The first technology used to sequence bacterial genomes was Sanger sequencing. Watch this video to see how the technology works. It produces 500bp reads, but is expensive and laborious. Illumina’s sequencing-by-synthesis approach can produce reads of 75-250bp much more cheaply. Watch this video explaining the technology. This technology also produces a large number of reads, perhaps 500 million at a time. The Pacific Biosciences Single-Molecule Real Time (SMRT) technology produces reads of 5000bp-40000bp. The number of reads is smaller – around 1 million. Different technologies are useful for different applications. In particular SMRT sequencing is most useful for producing new genome reference sequences from scratch, whereas sequencing-by-synthesis is most useful for looking at variation between genomes (resequencing).
To get a good understanding of what a whole genome looks like we need to start with many copies of the bacterial genome (i.e. lots of bacteria). If we only had one genome, there would be many parts of the genome that would not be captured, because some pieces are lost by chance during preparation for sequencing. There are also usually a small number of errors in each read. Therefore, to be sure of producing an accurate final sequence, we need to sample each part of the genome several times.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes: Disease Outbreaks and Antimicrobial Resistance

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education