Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Genomes are our guides to understanding how life works

DeoxyriboNucleic Acid (DNA) is a long molecule made up of a repeating string of just four different chemicals (nucleotides or bases). These are adenine (A), thymine (T), guanine (G) and cytosine (C).

Diagram of DNA

Diagram of DNA Courtesy: National Human Genome Research Institute

We can describe any particular DNA molecule using the first letters of these bases. A rather short one might look like this: ATGCGCTGGTAG. This string of letters is called a DNA sequence. Every living thing contains its own string (or strings) of DNA which are unique. The DNA sequence describes how to make each organism from scratch, although we only partially understand how this happens.

Genes are a small part of larger DNA molecules.

We have about 25,000 genes, whereas many bacteria have around 4000. This early result of genome sequencing was surprising. It was expected that complex organisms, like humans, would need a lot more than six times the number of genes in a bacterium. It is still not well understood why this is the case. By comparing the sequences of letters in genes (e.g. TGACAGGTC…), we can see which genes are shared by different species. Surprisingly, some genes are quite similar between us and bacteria. This shows us that different forms of life share a common history in the deep evolutionary past.

The genetic code

Most genes are codes describing how to make different types of protein. Like DNA, proteins can be thought of as strings of letters. Instead of four bases (A, T, G, C), proteins have 20 amino acids. Most genes encode proteins, and every three bases encodes one amino acid. There are 64 different three letter words that can be made with a four letter alphabet, such as that found in DNA. This means that a simple DNA code of four different letters can describe the more complicated, 20 letter alphabet of proteins. This is important, because while DNA is a fairly simple molecule, which stores information, proteins have to do lots of different things. As an example, the DNA bases ATG encode the amino acid methionine. Take a look at the full genetic code to find out how each of the three letter DNA codes are used.

Finding and understanding genes

If we look at the order of DNA bases (A, C, G, T) in a particular DNA sequence (genome) we are good at finding the genes. This is especially true for bacteria, which have simpler genes than we do. We know which sets of three nucleotides encode which amino acids (e.g. we know that nucleotides ATG encode the amino acid methionine). Therefore, for each gene we can predict the order of amino acids in each protein. What remains difficult is to figure out what the role of that protein is in the cell. For example, a protein might be involved in making the bacterial cell wall, or it might interfere with our immune systems, preventing us from defending ourselves against infection.

One of the main aims of pathogen genomics is to determine which genes are important for different bacteria to cause disease. By doing this we can better understand how to stop them. By sequencing bacterial genomes, we can identify genes that either cause disease or help the bacteria resist antimicrobial drugs.

Share this article:

This article is from the free online course:

Bacterial Genomes: Disease Outbreaks and Antimicrobial Resistance

Wellcome Genome Campus Advanced Courses and Scientific Conferences

Contact FutureLearn for Support