Genetic variations
What is a variant and what are the types of genetic variations?
Genetic variants are differences in the DNA sequences of individuals within a population. They can be germline or somatic. Germline variants occur in all cells including germ cells and can be inherited from one’s parents or arise spontaneously (de novo). Somatic variants arise during an individual’s lifetime in tissues other than the germ cells and are not passed to subsequent generations. In this course, we will focus on germline variations.
Genetic variations encompass changes at the nucleotide level (single or more base changes), differences in the number of copies of specific DNA segments, or even a partial/full chromosome. These genetic variations collectively contribute to the diversity (of genotypes and phenotypes) seen among individuals. They exert influence over an individual’s physical (and behavioural) characteristics and health status.
Genetic variations can be described at the level of DNA sequence and by their consequences on mRNA and protein. Therefore the nomenclature of a variant is interconnected. The figure below provides a list of the different types of variants and how they are categorised. The variants are then described in the text following the figure.
Variations described at the genomic level
Figure 1. Types of DNA sequence variations.
Small indels
Short for “insertion-deletions,” they involve the addition and/or removal (or both) of a few nucleotides (typically more than one and less than 50 bases) within a DNA or RNA sequence.
Single Nucleotide Variations (SNV)
Single nucleotide variations involve changes at a single nucleotide within the DNA sequence, encompassing substitutions, insertions, or deletions.
- Single nucleotide insertion: Addition of a single nucleotide into a DNA or RNA sequence.
- Single nucleotide deletion: Removal of a single nucleotide from a DNA or RNA sequence.
- Substitution: Replacement of one nucleotide with another in a DNA or RNA sequence. These are the most common variations observed.
Aneuploidy
Refers to an abnormality in the number of chromosomes (loss or gain of an entire chromosome)
Large structural variants
Structural variants are genetic alterations that cause changes in the structure, organisation, or arrangement of larger DNA segments.
- Inversions: A genetic rearrangement where a segment of DNA is reversed in orientation within the chromosome with two breakpoints.
- Translocations Rearrangement of genetic material between non-homologous chromosomes.
- Repeat expansions: An increase in the number of repeating DNA sequences within a gene (short tandem repeats usually comprise 1-6 base pair sequence motifs) which can lead to certain genetic disorders.
Copy Number Variations (CNV)
Copy number variations are genetic alterations that involve the duplication or deletion of segments of DNA (typically 50 bases or more), resulting in changes in copy numbers of specific regions of the genome.
- Copy number deletion: Loss of a portion of a DNA segment, resulting in fewer copies of a specific genetic sequence.
- Copy number amplification: Increase in the number of copies of a specific genetic sequence within the DNA. This includes duplications, triplications and higher amplifications.
Variant descriptors based on the effect on the transcript (mRNA)
Figure 2. Types of RNA sequence variations.
As you are aware, the DNA sequences of a gene are ‘transcribed’ to produce mRNA and the codons in the mRNA are ‘translated’ into a polypeptide chain. Different types of genomic variants can affect transcription and translation as described in Figure 2 and the text below.
Start loss variants
Variation that prevents the initiation of translation at the usual start codon. Occasionally this might result in the use of an alternative start codon for translation.
Synonymous/silent variants
Variation that doesn’t alter the encoded amino acid, often occurring in the third position of a codon (as these nucleotides are redundant).
Missense/nonsynonymous variants
Variation that changes one amino acid to another in a protein.
Stop loss variants
Variation that prevents the termination of translation, resulting in a longer protein.
Nonsense/stop gain variants
A variation that converts a regular codon into a stop codon, leading to premature protein termination.
In-frame deletion
Deletion of nucleotides that maintains the reading frame of a gene (usually three or multiples of three nucleotides are changed).
Frameshift deletion
Deletion of nucleotides that alter the reading frame, often causing a nonfunctional protein.
In-frame insertion
Addition of nucleotides that maintain the reading frame of a gene (usually three or multiples of three nucleotides are changed).
Frameshift insertion
Addition of nucleotides that disrupt the reading frame, typically leading to a nonfunctional protein.
Variant descriptors that consider the effect on protein quantity and its function:
- Amorph (null allele): An allele that results in a complete loss of gene function.
- Hypomorph: An allele that leads to reduced gene function.
- Hypermorph: An allele that results in increased gene activity or expression.
- Antimorph (dominant negative allele): An allele that interferes with the function of the wild-type allele, often producing a dominant negative effect (and complete loss of function of both alleles).
- Neomorph: An allele that results in a novel or gain-of-function activity not found in the wild-type gene.
Variant nomenclature
Given the number of different types of variants, researchers and clinicians must describe them in such a way that avoids any confusion. Therefore, a common ‘language’ or standard is used to describe them. These standards for defining variations found in DNA, RNA and protein sequences have been set by the Human Genome Variation Society (HGVS). The standard comprises three elements: the reference sequence (a standardised representation of the human genome used by researchers when comparing with other genomes – which we will return to in the step What is a human reference genome?, followed by a description of the variant (the location in the coding sequence or genome where the change occurs), and the predicted consequence in parentheses.
For example, the variant NM-004006.2 : c. 4375C>T p. (Arg1459*), where NM-004006.2 is the reference sequence in Genbank, c.4375C>T describes the variant, indicating a change at the nucleotide level, specifically at the coding sequence at position 4375, from a C (cytosine) to a T(thymine), and p. (Arg1459*), which describes the predicted effect at the protein level as being a stop codon ( * ) at position 1459 in the arginine amino acid (Arg). This change is likely to result in a truncated, non-functional protein.
DNA variants are usually reported with respect to a specific gene based on the coding DNA reference sequence, in which case the variant description starts with a ‘c’ – as in the example c. 4375C>T above. However, variants can also described with reference to a genome reference sequence, in which case the ‘g’ is used. For example, g.32407761G>A, indicates where the original nucleotide guanine (G), at position 32407761 in the reference genome, is replaced by adenine (A). Where a variant occurs in the non-coding region of the genome, the description starts with ‘n’. For example NR_002196.1: n.601G>T describes a specific nucleotide change at position 601 in a non-coding RNA sequence, where guanine (G) is replaced by thymine (T). This alteration might affect the function of the non-coding RNA, depending on its role and the significance of the altered nucleotide within its sequence.
You can see further examples of different types of genetic variations (e.g. CNVs, SNVs, described earlier in this step) in the table available for download below, alongside the standard nomenclature, the predicted change and resulting disease. For example, NM_000518.5. c.20A>T p. (Glu7Val) describes a genetic variant where, in the mRNA sequence NM_000518.5, at nucleotide position 20, adenine (A) is replaced by thymine (T). This nucleotide change results in the substitution of glutamic acid (Glu) with valine (Val) at position 7 in the protein sequence. This is a missense mutation, as it results in a change in a single amino acid in the protein HBB, causing sickle cell disease.
You can use the Variant validator tool to check if you have described your variant of interest correctly.
Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations
Interpreting Genomic Variation: Overcoming Challenges in Diverse Populations
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free