Learn more about this course.

Finding and Annotating Genes in a Genome

Learn more about finding and annotating genes in a genome.

Seated learners at their computers and an Educator by the side of and assisting one learner, in a computer room

In this course we will focus on the annotation of coding sequence, that is to say, those regions of the genome that form the template for making proteins. From now on, we will call these regions CDS which stands for CoDing Sequence.

The process of annotating a genome implies finding the location of the genes, in our case, we will limit our task to find the locations of protein coding regions. How do we find these regions in a sea of just letters – As, Cs, Gs, Ts? All protein coding regions share some characteristics that we can use to hunt them down.

All CDSs have a START codon and a STOP. These signal the beginning and end of a protein coding region or CDS. Transcripts as well as polypeptides are synthesised from 5’ to 3’ therefore a start codon will be at the 5’ end of a sequence, whereas a stop codon will be at the 3’ end. Bacterial genes are encoded all in one go, that is to say that the nucleotide sequence that has the information to make a polypeptide is found all in a single stretch of uninterrupted sequence. By contrast, eukaryotes usually have introns, which are regions of non-coding genome interspersed with coding regions. Luckily for us, we don’t have to worry about these as bacterial genomes almost never have introns. Optional: Learn more about introns by following the link to a WikiPedia article, given below this article.
The total number of nucleotides in a CDS is in multiples of three. Proteins are made of amino acids and each amino acid is encoded by three nucleotides. This group of three nucleotides is called a codon.
START and STOP codons are well defined. Bacteria can use more than one codon to start the synthesis of a protein. These are: ATG encoding for Methionine, GTG encoding for Valine and TTG encoding for Leucine. In eukaryotes only ATG is a valid start codon. Stop codons are common to bacteria and eukaryotes and are: TAA, TAG and TGA. These codons do not encode an amino acid but signal the end of the protein sequence.

Caution! Start codons do not belong exclusively at the start of the sequence. They can also be found along the CDS. This represents one challenge, how do we know which Methionine, Valine or Leucine is the first amino acid? To resolve this problem we will need more information about the actual sequence or at least, how does it compare with other similar sequences (for example using BLAST). For now, let’s just assume a safe position: if more than one start codon is available, we will choose the one that produces the largest possible CDS.

Want to keep
learning?

This content is taken from
Wellcome Connecting Science online course,

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

View Course

Want to keep learning?

This content is taken from Wellcome Connecting Science online course

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

View Course

See other articles from this course

This article is from the free online

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Finding and Annotating Genes in a Genome

Want to keep
learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Finding and Annotating Genes in a Genome

Share this step

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Want to keep learning?

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Share this

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?