Skip main navigation

What are clades, lineages and variants?

Article introducing the definitions of clades, lineages and variants for SARS-CoV-2
© COG-Train

Pathogen genomics helps us to track the spread of an outbreak and to identify significant changes in the genome of a pathogen. This, in turn, helps us to identify the emergence of new variants or lineages, both nationally and internationally.

Phylogenetics using web interfaces like Pangolin and Nextstrain to help us with genomic surveillance has been extremely important during the pandemic period. We use it to differentiate one transmission chain from another (e.g. local nosocomial versus school outbreaks). However, in order to do this, we needed to create categories into which we could classify the diversity. This was done by dividing the groups into clades and lineages.

So, what are clades, lineages and variants?

A clade is a very broad way of grouping SARS CoV-2 isolates. Hence, it gives us a sense of the diversity patterns over years. The samples also need to have a minimum size and persistence.

The lineage rules for SARS-CoV-2 were proposed by Rambaut and colleagues to help us to describe the diversity we are observing while the pandemic is happening and attempts to describe a significant epidemiological event. For example, a new lineage may be assigned if that specific group shows an increase in transmission compared to another group. There are no criteria for minimum size and persistence. Hence a lineage may contain a small number of isolates or a very large number. For example, there are over 1,000 Pango lineages, compared to less than 20 clades.

Lastly, a virus variant has mutations that have a biological significance (for example, it is associated with a virulence factor). The most significant variants are currently called Alpha, Beta, Gamma, Delta and Omicron.

Below are two images from Nextstrain. This platform is updated often, so the results may look a bit different by the time you view it.

Screenshot of the Nextstrain tool showing a phylogenetic tree of SARS-CoV-2 lineages coloured by Pango lineage

Click here to enlarge the image

Figure 1 – Example results from Nextstrain 1

Screenshot of the Nextstrain tool showing a phylogenetic tree of SARS-CoV-2 lineages coloured by clade

Click here to enlarge the image

Figure 2 – Example results from Nextstrain 2


1) Do you think the results in Figure 1 show the colour coding of clades or of lineages? (Hint! Look in the far-left column “Color By” to see how this was filtered).
2) Do you think the results in Figure 2 show the colour coding of clades or of lineages?
3) Does filtering them on clades or lineages produce the most colours?
4) Based on your answer in Q3 and what was discussed above, why do you think that is?
5) Have a look at the section to the left of the phylogenetic tree in Figure 2. You will see little boxes with colours ranging from purple to blues, yellows, reds and then greys. What do you notice in their names?
6) What does this mean?
7) How are the types of variants determined and classified?

© COG-Train
This article is from the free online

Making sense of genomic data: COVID-19 web-based bioinformatics

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now