Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Protein domains and hotspots

Article introduce basic concepts for protein domain and hotspots

Changing an amino acid side chain in a protein will change its shape and chemistry which may have functional consequences. Each amino acid has an individual role based on the position in the protein and therefore the effects of variation will differ based on the physicochemical difference between the reference and the new side chain and its relevance to the role.

Generally, amino acids work in combination with others in their local environment to elicit a specific function. Such units of protein structure can often function and fold independently to the rest of the protein and are self-stabilising (Figure 1). These are termed protein domains. There are multiple methods to define protein domains, but all are based to some degree on sequence or structure conservation. All of the commonly used domain definitions such as Pfam and CATH are available via the EMBL-EBI InterPro website.

Representation of a 3D protein structure for the tyrosine kinase protein. The 3 protein domains (red, yellow and blue) are composed of coiled (alfa-helix) and pleated-sheet (beta-sheets) shaped structures connected by curved lines (loops). Click to enlarge

Figure 1. Tyrosine-protein kinase Lyn AlphaFold structure AF-PO7948-F1 coloured by InterPro domain. Red – SH3 domain, blue – SH2 domain, orange – protein kinase domain. Rendered and coloured in Pymol.

Their specific and highly evolved role in function means that the same domain can be found in many different proteins and form one of the units of structural evolution. Their order and position in proteins determine how their individual contribution affects overall protein function. The conservation of structure and function of domains between proteins can help to predict the role and structures of unannotated proteins, even across species. Transposition of knowledge between domains can also aid the interpretation of the mechanism by which variation may impact a protein in terms of disrupting wild-type domain function.

Domains often represent the most functionally important elements in proteins and are less tolerant to changes than inter-domain regions of proteins. This high conservation means that variation in domain regions is more likely to be disruptive to the protein and pathogenic to the organism. It also means that their sequences are less variable between diverse populations than inter-domain regions in proteins or intergenic regions, so the interpretation of variant effects is more transposable.

Hotspots are regions which are abnormally enriched in mutations. Where the region is intolerant to change in terms of function these hotspots can represent locations for disease association. They can either be clustered in sequence, as in genomic interpretation, or spatially clustered in proteins, where might may not be close in sequence space. Variants of unknown significance which are proximally located to characterised pathogenic variants are more likely to be associated with disease. Proximity can also suggest a common molecular consequence or organismal phenotype. In proteins, hotspots often occur in functional domains. This means that a hotspot in a protein can be indicative of a functional domain, even if it has not yet been characterised. As hotspots in proteins are more likely to be in functional domains, they are less variable between diverse populations than variants which are more isolated.

In the ACMG/AMP classification framework hotspot evidence is incorporated under functional data using the criterion PM1 (located in a mutational hot spot and/or critical and well-established functional domain (e.g. active site of an enzyme) without benign variation). MutScore and the DECIPHER protein browser (e.g. for MYH7) are useful tools to assess the clustering of mutations in genes. Care should be taken when using this criterion, and ideally quantitative approaches to determining hotspot regions should be used.

© Wellcome Connecting Science
This article is from the free online

Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now