Dr. Rob Finn guiding a learner

Automated annotation systems

In this article you will learn how protein annotation is completed on a large scale.

The process of manual annotation of protein sequences is very laborious. With vast numbers of protein sequences found in databases, it would be almost impossible to provide manual annotation for all of them.

Instead, computer scientists have designed software pipelines that use sequence similarity tools and protein domain prediction tools (such as Pfam and Phobius) to automatically predict putative functions of a protein sequences. The results of individual searches are combined to give a simplified result and a protein description phrase.

One popular tool for automatic annotation is the search engine of Interpro called InterproScan.

Automatic annotation pipelines have the advantage of being fast and systematic. However, their accuracy can sometimes be compromised by the quality of the original databases used for the comparisons. Hence, results from automatic annotation pipelines must be taken with caution. Most databases have a system to score the annotation provided. For example, Uniprot uses a blue logo to show entries with automatic annotation and a gold logo to highlight those entries that have been manually annotated.

