Learn more about this course.

Similar is not the same but it helps!

Similar is not the same but it helps! (A praise for homology annotation)

A variety of cups and mugs with one larger blue jug

In this article you will be introduced to the concept of homology annotation, a way to identify sequences that are similar to one another.

Imagine that you have never seen a coffee mug before. However, you have drunk water from a tumbler glass, a paper cup and even used your hands as a vessel. When you see an “unknown item” (i.e. coffee mug), despite its unfamiliar form, with a handle sticking out of one side, because of the shape of this item you can infer it might (or at least could) be used for the same purpose of holding or drinking a liquid. Very intuitive, right?

Assigning the same function to things that look similar is innate to human nature. The same process can be used for protein sequences.

Now imagine that we have a large collection of protein sequences (e.g. 1000) only 20 of which we know the function. We can use a similar approach to that used with the coffee mug to infer their functions: sequences that look the same have a good chance of doing the same thing.

Want to keep
learning?

This content is taken from
Wellcome Connecting Science online course,

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

View Course

This is called homology annotation and the principle that enables scientists to use similarity to infer function is based on the conservation of a given sequence or slight variations of it throughout evolution. In general terms, the more similar two sequences are, the more likely they are to be related. Consequently, homology annotation is based on the comparison of DNA or proteins at the sequence level – that is, by comparing the similarity of nucleotides or amino acids sequences between related sequences.

Protein sequences that confer function are often found in blocks of conservation called protein domains. These regions have a defined three-dimensional structure or motif (shape) that can function and evolve independently from the rest of the protein sequence. These blocks of conservation are found in proteins throughout nature, and any given protein sequence can have more than one protein domain. The key to using motif similarity to infer function relies on the principle that when two proteins have a conserved function, although their sequence similarity at the amino acid level can be lost, their protein domain conservation must remain.

However, there are exceptions to every rule and it is possible that two sequences or motifs that are similar to each other have different roles in different organisms or even in different compartments of the same cell. Therefore, it is important to remember that the inference of function is only a projection of its function. Therefore, it is common to see protein names or functional descriptions accompanied by the words “putative” or “potential”. In order to be certain of the function of a protein, it must be confirmed by experiment.

For a more in-depth view of how these secondary databases can aid the annotation of full genomes, we recommend this review article entitled “Protein function annotation by homology-based inference” by Loewenstein et al.

In the next sections you will learn about tools and databases that use similarity between sequences and motifs to help researchers assign function to proteins.

Want to keep learning?

This content is taken from Wellcome Connecting Science online course

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

View Course

See other articles from this course

This article is from the free online

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Similar is not the same but it helps!

Want to keep
learning?

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Want to keep learning?

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Similar is not the same but it helps!

Share this step

Want to keep learning?

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Want to keep learning?

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Share this

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Bacterial Genomes I: From DNA to Protein Function Using Bioinformatics

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?