Four Educators in white laboratory coats, two women in the foreground laughing together, and set further back a man observing with another person partly hidden
Peers reviewing their work

Welcome to the project

In this Step we will introduce you to the project tasks and guide your preparation in support of your learning and completion of the assignment: a short written report (200 words maximum) on the differences and similarities of two genomes from the Mycobacterium genus.

First, we will learn about the chosen bacterium, by reading an introduction to Mycobacterium species, and a review article on Mycobacterium. Before, downloading specified annotation files onto your computer.

Let’s learn more about Mycobacterium

Introduction Mycobacterium organisms are ubiquitous and found in various environments. However, some species, like Mycobacterium tuberculosis, cannot live freely in the environment and therefore require a host. These species are pathogenic and cause devastating diseases. In the steps ahead, you will compare the genomes of two species of Mycobacterium known to cause disease in humans. These are M. tuberculosis and M. leprae.

M. tuberculosis usually infects the lungs and causes tuberculosis, although other organs can be affected. M. leprae is the bacterium that causes leprosy and mainly affects the skin, peripheral nerves and the mucosal surfaces of the upper respiratory tract and eyes. Although these two bacteria belong to the same genus and both can cause disease in humans, their genomes are starkly different. One of the more interesting observations is the difference in their genome size.

The M. tuberculosis genome is approximately 4.4 Mb, whereas M. leprae has a smaller genome, with an approximate size of 3.2 Mb. Another interesting observation is that M. leprae has undergone reductive evolution, a process in which genes become non-functional due to mutational changes in the genes. As a result, M. leprae has some non-functional genes that previously (and in other species) encoded for proteins. Collectively these are known as pseudogenes and are often identified by the presence of premature stop codons. Normally, we cannot assess function just by looking at the genome sequence, we must use a genome comparison approach in order to find out which genes are pseudogenes by comparing to a similar genome. To do this we will need a closely related species, in this instance M. tuberculosis, which has approximately 3,924 genes and only 6 pseudogenes. By contrast M. leprae has 1,133 annotated pseudogenes and 1614 protein coding genes.

Let’s read:

  • the review article about Mycobacterium** noting that ,although, the entire article is relevant for the tasks ahead, you should concentrate on the following sections:

  • Introduction

  • Genome Biology

  • Physiology & Biochemistry

  • Reductive evolution

  • Pseudogenes and Transcription of pseudogenes.

Getting an idea of what is discussed in these sections will give you a strong start in your quest to answering the questions in subsequent Steps.

You can read the main article here Mycobacterium leprae: genes, pseudogenes and genetic diversity. Singh, P. and S. T. Cole (2011).

Finally, for your preparations, download the annotation files from the sites below:

Genome sequence and annotation in EMBL format for M. leprae: or can be downloaded from NCBI using the following link:

And the M. tuberculosis annotation file can be found here: or can be downloaded from NCBI using the following link:

You may need to copy and paste the links in your internet browser and download using the ‘Save as’ option from your browser.

Please note: our earlier course ‘Bacterial Genomes: From DNA to Protein Function using Bioinformatics’ is a recommended pre-requisite: it is open for joining and access during the 3 week, live presentation of this course, and then closes until its next live presentation.

If you have questions or concerns about any of the Steps or tasks, please use the comments area to ask questions, discuss your queries and seek solutions with other learners?

Share this article:

This article is from the free online course:

Bacterial Genomes: Accessing and Analysing Microbial Genome Data

Wellcome Genome Campus Advanced Courses and Scientific Conferences