Large RUVA protein molecule bound to thin double stranded DNA - side view
RUVA protein molecule bound to DNA- side view

Finding protein function: Functional annotation

In addition to finding where genes are in the genome, it is possible to find more information about the function of the proteins they encode. Almost universally, sequence similarity tools are used to transfer functions from one known protein to an unknown protein, provided they are similar enough. Using similarity between sequences to infer their function is called homology annotation.

Initially, CDSs were semi-automatically marked on the genome and subsequently manually checked to ensure they were ‘true’ genes encoding ‘true’ proteins. Later as technologies evolved, more high-throughput methods of predicting CDS were developed using approaches that predicted the open reading frames and manually curated using tools such as BLASTx and comparing the newly identified CDS with protein databases such as UniProt or the Non-redundant protein database at NCBI-NIH.

More modern annotation approaches include online based tools such as RAST. RAST: Rapid Annotations using Subsystems Technology as it alludes to in the title, uses a ‘subsystem’ or pathway approach to annotate bacterial or archaeal genomes. In short, it classifies the genes into functional roles such as metabolic pathways or a collection of functional roles such as transport systems, thus ensuring a more accurate annotation.

In this course, we will use ready made annotation files from related bacterial pathogens, and we will focus on comparing these in order to gain more insight into the biology of bacterial genomes.

In 2008, Aziz et al. published The RAST Server: Rapid Annotations using Subsystems Technology https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-75, a paper describing RAST. You can follow the link to read more about RAST and its capabilities.

Share this article:

This article is from the free online course:

Bacterial Genomes: Accessing and Analysing Microbial Genome Data

Wellcome Genome Campus Advanced Courses and Scientific Conferences