In this video, I’m going to show you an example of how we can use genome sequencing data of different bacterial strains to answer a question of importance to human health. Chlamydia trachomatis is one of the most prevalent human pathogens in the world, causing a variety of infections. It is the leading cause of sexually transmitted infections with an estimated 131 million new cases each year. It is also the leading cause of preventable infectious blindness with tens of millions of people thought to have active disease. A recently isolated Swedish Chlamydia trachomatis strain, known as “NV,” caused a European health alert in 2006. During this time it became the dominant strain circulating in some European countries and began to spread worldwide.
The reason for this was that it evaded detection by the widely used PCR-based diagnostic test. In this part of the course, we will align the Illumina sequencing reads from the NV strain against the reference sequence called “L2”. During the course of this exercise, we will discover the reason why the test failed. Artemis is a DNA viewer and annotation tool free to download and use from the Sanger Institute. The programme allows the user to view a range of files from simple sequence files, such as FASTA to EMBL and GenBank entries, as well as the results of sequence analyses.
Artemis gives you two views of the same genome region, so you can zoom in to inspect the DNA sequence and zoom out to view genes or even an entire chromosome all within one screen. This might look very dry, but you are looking at the code of life– all the information needed to build a living thing– in this case, a bacterium. And in fact, we know where all the really important parts are– the protein coding genes that make up most of the genome. However, we don’t know what most of them are for. This is the case for all genomes, including the human genome. There is a lot of work left to do in figuring out the functions of genes.
DNA is a double helix where the two strands contain the same information but in complementary code. So where you see an “A” on one strand, there is always a “T” on the other. Where you see a “G” on one strand, there is a “C” on the other. Here, the top strand is called the forward strand, and the bottom strand the reverse strand. Most bacterial chromosomes are circular, so there is no real beginning or end. The fact that we have a start to the chromosome here is just a convention. Now, you can see where the genes are. Some genes are one strand and some on the other.
We can use various different types of information to figure out where the genes are, and this is called “genome annotation.” You can see that bacterial genomes are made up mostly of genes. The vast majority of these are instructions for making proteins. One gene has the instructions to make one protein. If we zoom out, you can get a feel for the size of the genome and the number of genes. The genes fairly evenly cover both strands. This bacterium has fewer than 1,000 genes, whereas humans have about 20,000. Many bacteria have a few thousand. I hope this gives you a flavour for what a bacterial genome looks like and how genes are arranged in it.
However, our story here actually has nothing much to do with genes. To answer our question about how the NV strain of chlamydia evaded detection, we will look at the differences between the NV and the L2 strains of Chlamydia by comparing sequencing reads from the NV strain to the reference genome made from the L2 strain. We will now load in the sequencing data from the NV strain. What we can see is that for each position in the genome we have lots of reads mapping to it. This means we have sampled each position many times and therefore can be confident about our results.
If we look at where the reads from NV disagree with the sequence of L2, you can see that there are lots of randomly distributed differences. These are due to random errors in the sequencing process. However, there are other places where all the reads agree on a difference. These are likely to be real differences, which we call “single nucleotide polymorphisms,” or SNPs. These are the differences we most commonly use to compare bacteria from the same species and look for evidence of outbreaks as you will find out later in the course. If we move to the end of the sequence, we can see that the reads stack up higher here than over the rest.
This ought to be because there are more copies of this sequence in the NV genome than in the L2 genome. This is true, but this isn’t actually part of the bacterial chromosome. It is the sequence of a plasmid found in Chlamydia, which has been added on the end of the chromosomal sequence. This just makes the data easier to look at all in one go. It doesn’t reflect anything about the real biology. Plasmids are often present in multiple copies in the cell, whereas the chromosome is only in a single copy. This is why more reads map to the plasmid sequence.
We might assume that because the read depth is about 100 over the chromosome and 400 over the plasmid that there are four copies of the plasmid floating around in the bacterial cell. What else do you notice about the reads mapping to the plasmid sequence? There is a gap in the reads mapping from the NV strain on to the L2 reference. This suggests that that sequence is missing from the plasmid in the NV strain. It looks to be that one half of a gene is missing, or as we say “deleted” in the NV strain. Why should this be? It is this region of the plasmid that was used to detect whether people were infected with Chlamydia.
The NV strain acquired a deletion mutation. The region used for surveillance was lost. So when the diagnostic test was used on people who were infected with this strain of Chlamydia, it appeared that they were not infected. Therefore, they were not treated, and they could go on to infect more people, resulting in this strain becoming more prevalent. This is a fascinating example of evolution driven by human intervention. The way the diagnostic test was designed affected how the bacteria evolved, resulting in a strain, which evaded the test. It highlights the limitations of bacterial typing methods, which focus on a single region of the genome.
Whole genome sequencing of these strains allows us to discover how the bacterium evaded the diagnostic test and think about how we might design better tests in the future. If we used whole genome sequencing to test for the presence of the bacterium, then there would be no way the bacterium could mutate to evade detection. However, this is not always practical or cost effective. I’ve showed you how we can view genome reference sequences and look of variation between strains using the Artemis genome browser. I hope you now have a better understanding of how we can use genome sequencing data to answer important questions about the spread of infectious diseases.