"How do we visualise genomic data? In this article, Dr. Adam Reid introduces the Artemis genome browser
In this article, we describe how the Artemis genome browser displays genome sequences. A genome browser is a piece of software that allows us to visualise a genome sequence and see the location of the genes. We will also show how it can display sequencing reads from another strain, mapped to the reference, so that differences between strains can be discovered. The example, illustrated below, shows two strains of Chlamydia trachomatis
– the L2 reference strain and the New Variant (NV) strain. Chlamydia trachomatis
is a major sexually transmitted pathogen. It is the leading cause of preventable infectious blindness.
Artemis has three windows. The first two show the genome, the third has a list of the different features on the genome, such as genes. The first genome window does not show the ACGT bases, or sequence of the genome, but instead shows a more zoomed out view in which we can see several genes (blue boxes). These genes can be on the forward strand (top half of the window pointing right) or on the reverse strand (bottom half of the window, pointing left). When genes are transcribed into messenger RNA, those on the forward strand are read from left to right, whereas those on the reverse strand are read from right to left. Artemis genome reference (Click to expand)
The second genome window shows a closer view than the first. Although we can only see the start of the first gene, we see every ATGC base of this part of the genome. We can also see both the forward and reverse strands. You can see these have different sequences, but that they are complementary. Where there is an A on one strand, there is always a T on the other. Where there is a G on one strand, there is always a C on the other. This complementarity of the strands was noticed by Watson and Crick when they determined the structure of DNA https://www.nature.com/articles/171737a0
. It means that the genome can be copied by splitting the strands and filling in the complimentary bases. The second genome window also shows all the amino acid sequences that could be encoded by the genome, reading it on both the forward and reverse strands. Not all these amino acid sequences are actually made from the genome, but this information helps us to find genes and understand their functions. We can see that the highlighted gene (hemB
; surrounded by a red box in both genome windows) is on the forward strand and starts with the DNA sequence ATGACAAGGCTTCCA, which translates into the amino acid sequence MTRIP.
Artemis also allows you to view mapped sequencing reads (image below). In this example, we see Illumina sequencing reads from the Swedish NV strain mapped to the L2 reference, which we have already mentioned above. In the second image, we have zoomed in to see the first two genes in the genome and have loaded the mapped reads from the NV strain. The reads are the blue or green lines above the gene window. A green line means that there are multiple reads with exactly the same sequence, whereas a blue line is a single read. Red marks represent a difference in sequence between a read and the reference sequence. You can see these differences are generally spread about randomly. This is because the sequencing technology introduces occasional random errors. In most cases these do not cause any problem, because there are plenty of reads without errors. Where red marks stack up and a difference from the reference is found in every read matching that part of the genome, we can be confident that we are seeing a a real difference between the L2 and NV strains. This is known as a Single Nucleotide Polymorphism or SNP (pronounced snip). Artemis mapped reads (Click to expand)
In the following video, we will go over these concepts again and you will find out more about visualisation of bacterial genomes. You will also discover how resequencing was used to discover how the NV strain of Chlamydia
evolved to be able to evade detection.