Skip main navigation

Introduction to Phylogenetics

Article introducing concepts for phylogenetics
© COG-Train

In this step we will introduce basic concepts of phylogenetics to better understand SARS-CoV-2 evolution and the emergence of new variants.

In biology, evolution is the process of modification and adaptation that passes on from an organism to its future generations. In the context of SARS-CoV-2, mutations in the spike gene are examples of such changes. A mutation is a genetic variation, such as a change of a nucleotide (e.g an adenine substituted by a cytosine). A group of three nucleotides (codon) encodes an amino acid (subunits of proteins). Nucleotide changes can lead to an amino acid substitution. For instance, the D614G mutation of SARS-CoV-2 Spike protein means that the original virus had an aspartate (D) at the 614th residue (amino acid) of the Spike and that the new variant has now a glycine (G) replacing it. Deletions or insertions of nucleotides can also cause alterations to the protein.

Phylogenetic inferences apply models and algorithms to reconstruct the evolutionary history of a group of organisms. It evaluates genetic changes (e.g mutations) to predict similarities and relationship pathways between organisms. These relationships are visualised through cladograms and phylogenetic trees, which are graphical representations of the phylogenetic inferences. A cladogram is a simple representation of the hypothetical ancestry of a group of organisms, whereas a phylogenetic tree uses data analysis to infer the actual similarity among these organisms.

Figure 1 depicts a cladogram (in black) of four organisms or taxa A-D and their hypothetical evolutionary relationships through time. Each taxon has a connection (branch) to a node. The length of the branch is an inference of the evolutionary steps or genetic changes from the analysed organisms to their common ancestor, represented by a node. A clade is composed of a group of taxa that shares a common node/ancestor, also known as lineage in the context of SARS-CoV-2. The root represents the hypothetical last common ancestor of the analysed group. Purple and green cladograms are just different graphical representations of the same inference.

Three cladograms in black, purple and green, respectively. A cladogram is a graphic visualisation that resembles a tree with a root, branches and nodes. Detailed explanation in the body text

Figure 1 – Different graphic representations of a cladogram of four organisms.

Can you see all these elements: branch, node, root, clade, taxon on the NexStrain phylogeny for SARS-CoV-2? Post your interpretation and questions in the comments.

© COG-Train
This article is from the free online

The Power of Genomics to Understand the COVID-19 Pandemic

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now