Genome reference sequences and resequencing
Genome assemblyWhen sequencing an entirely new species, there is no pre-existing, similar sequence to help us out in determining where genes are located. The goal is to produce what is known as a ‘reference genome sequence’. To produce a reference sequence, we use a process termed assembly. Assembly is the process of putting together individual sequencing reads, much like a puzzle, to make longer stretches of sequence called contigs. The best possible result is to end up with a single sequence that represents the entire bacterial chromosome. In reality, we cannot always complete the puzzle and there are pieces that we cannot put together. Contigs are assembled using computer programs that look for overlaps between sequencing reads. The image below shows how genome sequencing is used to produce either reference genome assemblies or to identify differences between closely related bacteria (resequencing). Assembly and resequencing (Click to expand) © Adam Reid 2017
Genome annotationOnce a reference assembly has been produced, the next step is to find the genes, a step known as genome annotation. There are computer programs designed to do this. They find parts of the genome sequence which look like they might encode protein sequences. This is important because it identifies the functional toolkit of the bacterium. It also helps us understand differences between bacterial genomes. When we look at changes in the genome that might be related to antimicrobial resistance, for example, we can identify which genes are involved.
Different sequencing technologies have different rolesOver time, more genome sequences have become available for different species. Developments in genome sequencing technology have also made sequencing easier and cheaper. Some new genome sequencing technologies allow us to improve reference sequences because they have much longer sequencing reads (e.g. PacBio and Oxford Nanopore can produce 10,000-50,000 bases), which help to join contigs together and produce more complete reference sequences. Other technologies produce relatively short reads (Illumina produces 75-150 bases) but generate a very large number of them at low cost. The ability to produce lots of short reads has allowed us to identify similarities and differences between bacteria of the same species cheaply and efficiently (resequencing). This has been crucial in helping us track the spread of bacterial disease.
ResequencingSequencing a new example of a species for which a reference genome already exists is known as resequencing. Discovering what makes the new bacterial genome different from the reference is much easier than assembling a new reference genome. We already know what the genome generally looks like and the location of the genes. Instead of assembling a new genome sequence, we use read mapping. For each sequencing read from the new bacterium, we look for the most similar part of the reference genome and place it there. We can then look for differences between the mapped reads and reference genome, which represent mutations in the genome of the new bacterium.
Bacterial Genomes: Disease Outbreaks and Antimicrobial Resistance
Our purpose is to transform access to education.
We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.
We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.