Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Protein functional data

An article describing the main tools applied to infer protein function

Understanding the functional role of an amino acid within a protein and the necessary physicochemical characteristics for that function can help to interpret the probability that a variant will alter function. It can also suggest the molecular mechanism by which the change occurs. The functional roles of amino acids can roughly be divided into five, potentially overlapping, categories:

Structural stability: amino acids can stabilise protein folding via several mechanisms which are dependent upon their positions and chemical environment.

Enzyme active sites: active sites are the regions of enzymes which bind a substrate before it undergoes a chemical reaction. Specific amino acid side chain chemistry is required to facilitate this reaction.

Ligand/nucleic acid binding: ligands often bind proteins in order to mediate a signal via a conformational change. The binding site can be very specific, requiring a particular set of amino acid side chains. Specific binding of DNA or RNA also requires particular amino acid side chains.

Protein-protein interactions: protein-protein interactions are critical for several biological processes and often have high specificity from constrained amino acids.

Post-translational modifications (PTMs): chemical modifications which are essential for regulation and interactions. Modifications are specific to a single, or a group of, amino acids only.

Screenshot of the UniProt feature viewer website: https://www.uniprot.org/uniprotkb/Q9BYF1/feature-viewer for the ACE2 protein. It shows an amino acid sequence and its corresponding position numbers on the protein (343-399). Below parallel coloured lines overlap the protein sequencing. Each line represents different protein features: Molecular processing, Sequence information, Topology, Domains, Sites, PTM, Antigenic sequences, Mutagenesis, Variants, Proteomics, PDBe 3D structure coverage, AlphaFold and Structural features. Click to enlarge

Figure 1. UniProt feature viewer image of human Angiotensin-converting enzyme 2 (ACE2) protein positions 343-399. Each line represents different protein features that aid understanding of the functional role of amino acids within the protein: Molecular processing, Sequence information, Topology, Domains, Sites, PTMs, Antigenic sequences, Mutagenesis, Variants, Proteomics, PDBe 3D structure coverage, AlphaFold and Structural features.

For some functions, a very direct interpretation can be drawn of the molecular consequence. For example, a change from a serine, threonine or tyrosine to any other amino acid would abolish phosphorylation. Phosphorylation is a reversible post-translational modification in which an amino acid residue is phosphorylated (phosphate group added) which alters the structural conformation of a protein, causing it to become e.g. activated or deactivated. For other functions such as protein-protein interactions, it would be less clear if a single amino acid change would significantly alter specificity. The direct consequence of an amino acid change on function and the importance of the function in overall protein function is often reflected in inter-species conservation. Well-defined functional regions are often conserved across diverse human populations and so the molecular interpretation of variation within them is often similar.

In addition to inference based on conservation and population frequencies, the probability that a variant will affect function can be derived experimentally via mutagenesis experiments. These can be very targeted experiments on small numbers of amino acids implicated in disease, which are extracted from literature by resources such as UniProt. Alternatively, there may be much larger experiments such as multiplexed assays of variant effect (MAVE) which probe every possible change in a protein sequence (Figure 2), available via MaveDB. The readouts from MAVE assays may be general cell viability or function-specific such as binding affinity. It should be noted that scores are assay-specific and can only be interpreted by referring to the relevant manuscript.

Screenshot of MAVEDB output for ACE2 binding site. It depicts lines of coloured squares in shades varying from dark to light red and dark to light blue. Each square represents protein variants and the colour, the frequency score of the mutation. Dark red represents highly positive scores, while dark blue represents highly negative values.Click to enlarge

Figure 2. MAVE of ACE2 binding to SARS-CoV-2 spike protein, MAVEDB urn:mavedb:00000069-a-1 position 343-399, the same region as Figure 1. Each square represents protein variants and the colour, the frequency score of the mutation. In this assay, dark red represents highly positive scores, which depict enrichment, while dark blue represents highly negative scores. Enrichment scores are a proxy for relative binding to the SARS-CoV-2 spike protein.

Experiments such as MAVEs, usually performed by researchers, can be used to identify mutations associated with the disease. However, the result may be an experiment or cell type-specific and may not translate directly to an effect in the whole organism, or indeed every individual in a diverse population. The most reliable approach for interpreting variants with experimental evidence is to combine the data with functional annotations, evolutionary inferences and population and clinical data which can be done with tools such as ProtVar from EMBL-EBI.

In the ACMG/AMP classification framework, this data is incorporated under functional data using the criterion BS3 (well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing) or PS3 (well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product). When considering these criteria it is important to consider if the assay relates to the mechanism of disease e.g. if the disease being considered is caused by loss of protein function, then the assay must assess for protein loss of function. While assessing publications on functional consequences of the variants, ask if the experiments are well planned to address the key questions.

  • Is the cell line relevant to the function of the protein?
  • Is the correct tissue sampled and studied?
  • Is the animal model suitable for the gene and the phenotype in question?
  • Were there sufficient controls for the methods?
  • Were rescue experiments conducted?

Sometimes, a functional assay may be a simple assessment of enzyme activity (often available in the clinic) as in metabolic disorders. The ClinGen Sequence Variant Interpretation Working Group has published recommendations for the application of the functional evidence criteria.

© Wellcome Connecting Science
This article is from the free online

Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now