Skip main navigation

Using HGVS nomenclature for DNA sequence changes

In this video Dr Maria Jackson explains the basics of the HGVS nomenclature system, which is a universal language for describing DNA sequence changes
MARIA JACKSON: Last week, you learned about ISCN, the universal system for describing human karyotypes and alterations identified in them. In this video, we’ll look at how small changes in the DNA are described, together with the predicted effect on the encoded protein. The Human Genome Variation Society has generated a universal system for describing these changes, known as HGVS nomenclature. It’s important that scientists across the world are all using the same system because this makes sharing of information easier and also means that databases of variance can be compiled and searched. The HGVS nomenclature system is described in full on a website. Here, we will look at some of the most commonly used descriptions.
Let’s say we want to describe some variants for the sequence shown here, which represents the start of the coding DNA sequence for a gene together with the amino acids that are coded. Here is a variant which changes one nucleotide, an A to a T. And the amino acid that’s coded has now changed from Gln, or glutamine, to Leu, or leucine. In order to describe this, the first thing we need are some numbers to identify the individual nucleotides and amino acids. By convention, nucleotide numbering in the coding sequence begins at the A of the ATG, which codes for the start codon AUG in the RNA. Likewise, amino acid numbering begins with a start with thymine.
To describe the nucleotide change in HGVS, we start with “c.” which indicates we are giving a coding DNA sequence nomenclature. Next, we give the nucleotide sequence number. For this substitution, we then write the original nucleotide, A, followed by a greater than symbol to indicate the A has changed. And finally, the new nucleotide, T. So HGVS says in the coding DNA sequence at position five, A has been changed to T. Now, we look at describing the predicted change to the protein. Here, the nomenclature starts with “p.” to indicate that we are describing the protein. Next comes the three-letter name of the original amino acid, which here is Gln for glutamine.
Then, the position of that glutamine in the protein, which is position number two. And finally, the new amino acid, which here is leucine, Leu, in the three-letter code. Here is another change. This time at nucleotide nine where T has been substituted by A. So the coding DNA nomenclature is c.9T to A. In the protein we see that the codon for cystine at position three has been altered to a stop codon. This is written as “p.Cys3*” where the star indicates termination of the protein sequence. Here is a final example. In this case, there is deletion of A at position 10 in the nucleotide sequence.
So the coding sequence nomenclature starts with “c.10” followed by “del” for deletion, and the deleted nucleotide can be noted as an A. The change to the protein is more complex. We can see that the change affects the threonine, Thr, at position four. So the nomenclature starts “p.Thr4” followed by a note of the new amino acid coded, which is histidine, His.
However, all the downstream codons are now changed since this substitution has caused a shift in the reading frame. Thus, “fs” is added to the end to denote the frame shift. Obviously, it is important to know which gene or DNA sequence or protein is being referred to, and that should be noted too. This system is very complex and we’ve only looked at a few simple examples. But it’s very important that this system is in place because many human DNA variants are seen only rarely. Having a common language to describe variance allows scientists from around the world to share information about the cases they have seen, and thus promotes greater understanding of how genome variants affect health.

Just as alterations in chromosome structure and number are described using the International System for Human Cytogenetic Nomenclature (ISCN), changes to the DNA sequence are also described by an internationally agreed system, known as HGVS nomenclature.

If you visit the HGVS nomenclature website you will see that the system is very detailed and complex, but it is essential to have such a common language so that scientists across the world can each understand what the others have reported.

In other parts of this course you will see HGVS nomenclature used to describe variants in DNA sequences – when you see the nomenclature in a video or article you might want to pause and review the meaning to help familiarize yourself with the HGVS system.

This article is from the free online

Understanding Genetic Disorders: How DNA Influences Health

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education