Skip main navigation

Introduction to Phylogenetics

Article introducing concepts for phylogenetics
© COG-Train

In this step we will introduce basic concepts of phylogenetics to better understand SARS-CoV-2 evolution and the emergence of new variants.

In biology, evolution is the process of modification and adaptation that passes on from an organism to its future generations. In the context of SARS-CoV-2, mutations in the spike gene are examples of such changes. A mutation is a genetic variation, such as a change of a nucleotide (e.g an adenine substituted by a cytosine). A group of three nucleotides (codon) encodes an amino acid (subunits of proteins). Nucleotide changes can lead to an amino acid substitution. For instance, the D614G mutation of SARS-CoV-2 Spike protein means that the original virus had an aspartate (D) at the 614th residue (amino acid) of the Spike and that the new variant has now a glycine (G) replacing it. Deletions or insertions of nucleotides can also cause alterations to the protein.

Phylogenetic inferences apply models and algorithms to reconstruct the evolutionary history of a group of organisms. It evaluates genetic changes (e.g mutations) to predict similarities and relationship pathways between organisms. These relationships are visualised through cladograms and phylogenetic trees, which are graphical representations of the phylogenetic inferences. A cladogram is a simple representation of the hypothetical ancestry of a group of organisms, whereas a phylogenetic tree uses data analysis to infer the actual similarity among these organisms.

Figure 1 depicts a cladogram (in black) of four organisms or taxa A-D and their hypothetical evolutionary relationships through time. Each taxon has a connection (branch) to a node. The length of the branch is an inference of the evolutionary steps or genetic changes from the analysed organisms to their common ancestor, represented by a node. A clade is composed of a group of taxa that shares a common node/ancestor, also known as lineage in the context of SARS-CoV-2. The root represents the hypothetical last common ancestor of the analysed group. Purple and green cladograms are just different graphical representations of the same inference.

Three cladograms in black, purple and green, respectively. A cladogram is a graphic visualisation that resembles a tree with a root, branches and nodes. Detailed explanation in the body text

Figure 1 – Different graphic representations of a cladogram of four organisms.

Can you see all these elements: branch, node, root, clade, taxon on the NexStrain phylogeny for SARS-CoV-2? Post your interpretation and questions in the comments.

© COG-Train
This article is from the free online

The Power of Genomics to Understand the COVID-19 Pandemic

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education