Skip main navigation

From DNA to Protein

In this article we will describe how a protein sequence is generated from a DNA sequence. In biological systems, the process of transcription ‘transcribes’ (copies) DNA into RNA. It is this RNA molecule that serves as the actual template for the production of proteins. The process of making proteins out of a RNA template is called ‘translation’ and it is carried out by the ribosome. The RNA molecule that takes the message from the DNA to the ribosome is called messenger RNA or mRNA.
'Technicolour image of a RuvA protein bound to DNA - side view' by John Rafferty.
© Wellcome Genome Campus Advanced Courses and Scientific Conferences

In this article we will describe how a protein sequence is generated from a DNA sequence.

Biological Transcription

In biological systems, the process of transcription ‘transcribes’ (copies) DNA into RNA. It is this RNA molecule that serves as the actual template for the production of proteins. The process of making proteins out of a RNA template is called ‘translation’ and it is carried out by the ribosome. The RNA molecule that takes the message from the DNA to the ribosome is called messenger RNA or mRNA.

Protein Sequences

The building blocks of proteins are amino acids. Similar to DNA, there is a convention that dictates how the string of amino acids in proteins is represented. Protein sequences are represented from their amino or N-terminal to the carboxy or C-terminal. This is the direction in which they are read from the messenger RNA (mRNA) and synthesised by the ribosome (N-termini of free amino acids are chemically attached to the C-terminus of the nascent protein). This corresponds to the direction in which they appear in their DNA blueprint.

Genetic Codes and Codons

Each amino acid is encoded by a group of three nucleotides in the mRNA. Each three letter word is called a codon and en “codes” for an amino-acid. This code, or correspondence between codons and amino acids, is known as the genetic code. Different species can have different genetic codes, but they all follow the same rule: each codon always corresponds to the same amino-acid; however, one amino-acid can be encoded by more than one codon. A codon table can be used to decipher this code; these tables can depict either the DNA or RNA codons with the only difference being that in the RNA codon table “T” is replaced by “U”.

Although this table describes DNA codons, remember that DNA is transcribed into mRNA which in turn is translated into amino acids that form proteins. The prediction of an amino-acid sequence based on its nucleotide sequence is known as a conceptual translation. A conceptual translation is a prediction of the amino-acid sequence based on the nucleotide sequence and the known genetic code.

For this short example DNA sequence, the amino-acid sequence would be: (the codon number is only given as reference)

Codon number 1 2 3 4 5 6 7 8 9 10 11 Nucleotide sequence	ATG CGA TCG GAC AGT CGA GTC CAG TAG ACG ATC Amino-acid sequence M R S D S R V Q - T I

with the 9th codon (TAG) encoding a STOP signal. Notice that the 3rd and 5th codons are different, yet they both code for serine (S).

Reading Frames

As mentioned previously, the genetic code is read in codons of three-letter words. Therefore, for a DNA sequence of known orientation, there are three possible conceptual translations: the first one starting on the first base, the second one starting on the second base and finally the third one starting on the third base. These are referred to as three “reading frames”.

ATGCGATCGGACAGTCGAGTCCAGTAGACGATC	nucleotide sequence M R S D S R V Q - T I	1st reading frame C D R T V E S S R R 2nd reading frame A I G Q S S P V D D 3rd reading frame

In our example, the first reading frame starts with a Methionine (M) encoded by the ATG codon but if we were to consider the second reading frame and therefore to start “reading” the code from the second base of the nucleotide sequence, the first amino acid to be read would be (C) encoded by the TGC codon. Moreover, if we didn’t know the orientation of the nucleotide sequence, the conceptual translation could be read either in the forward (5’->3’) or the reverse (3’->5’) giving an additional three possible ways of reading the code.

The ExPASy Translate tool

A useful tool for predicting the conceptual translation of a nucleotide sequence is the “ExPASy translate tool”. This server provides a quick and easy way of finding the amino acid sequence corresponding to a nucleotide sequence in all of the six possible reading frames. Why not give it a try, and check whether the three amino acid sequences offered as 1st, 2nd and 3rd reading frames in the figure above are correct? Can you find out which are the amino acid sequences for the reverse strand?

© Wellcome Genome Campus Advanced Courses and Scientific Conferences
This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education