Skip main navigation

Genomes are our guides to understanding how life works

What do genome sequences mean? How is the blueprint for life encoded in a long string of letters? In this article we explain how DNA is interpreted.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences

DeoxyriboNucleic Acid (DNA) is a long molecule made up of a repeating string of just four different chemicals (nucleotides or bases). These are adenine (A), thymine (T), guanine (G) and cytosine (C).

Diagram of DNA

Diagram of DNA Courtesy: National Human Genome Research Institute

We can describe any particular DNA molecule using the first letters of these bases. A rather short one might look like this: ATGCGCTGGTAG. This string of letters is called a DNA sequence. Every living thing contains its own string (or strings) of DNA which are unique. The DNA sequence describes how to make each organism from scratch, although we only partially understand how this happens.

Genes are a small part of larger DNA molecules.

We have about 25,000 genes, whereas many bacteria have around 4000. This early result of genome sequencing was surprising. It was expected that complex organisms, like humans, would need a lot more than six times the number of genes in a bacterium. It is still not well understood why this is the case. By comparing the sequences of letters in genes (e.g. TGACAGGTC…), we can see which genes are shared by different species. Surprisingly, some genes are quite similar between us and bacteria. This shows us that different forms of life share a common history in the deep evolutionary past.

The genetic code

Most genes are codes describing how to make different types of protein. Like DNA, proteins can be thought of as strings of letters. Instead of four bases (A, T, G, C), proteins have 20 amino acids. Most genes encode proteins, and every three bases encodes one amino acid. There are 64 different three letter words that can be made with a four letter alphabet, such as that found in DNA. This means that a simple DNA code of four different letters can describe the more complicated, 20 letter alphabet of proteins. This is important, because while DNA is a fairly simple molecule, which stores information, proteins have to do lots of different things. As an example, the DNA bases ATG encode the amino acid methionine. Take a look at the full genetic code to find out how each of the three letter DNA codes are used.

Finding and understanding genes

If we look at the order of DNA bases (A, C, G, T) in a particular DNA sequence (genome) we are good at finding the genes. This is especially true for bacteria, which have simpler genes than we do. We know which sets of three nucleotides encode which amino acids (e.g. we know that nucleotides ATG encode the amino acid methionine). Therefore, for each gene we can predict the order of amino acids in each protein. What remains difficult is to figure out what the role of that protein is in the cell. For example, a protein might be involved in making the bacterial cell wall, or it might interfere with our immune systems, preventing us from defending ourselves against infection.

One of the main aims of pathogen genomics is to determine which genes are important for different bacteria to cause disease. By doing this we can better understand how to stop them. By sequencing bacterial genomes, we can identify genes that either cause disease or help the bacteria resist antimicrobial drugs.

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes: Disease Outbreaks and Antimicrobial Resistance

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now