Skip main navigation

Comparative genomics

Comparative analysis

The transmission of AMR amongst bacterial populations is important to understand, from epidemiological and clinical standpoints

When identifying AMR mechanisms like those covered in steps 1.7 and 1.8, it is important to appreciate that you are often dealing with the data on a sample-by-sample basis (Figure 1). This means that there is no way to identify where the resistance arose or how it may be circulating within a community. To do this we need to be able to identify relationships within a community. In order to identify relationships for outbreak and transmission investigations you need to be able to compare and quantify the relationships within a population (Figure 2).

Sample level analysis
Figure 1 When undertaking a sample level analysis you can identify characteristics of an individual sequence, such as AMR genes, (e.g. NDM’s covered in step 3.5), which will provide you with information about that specific organism, e.g. whether it is a CPO (carbapenem-producing organism), but will not give you any information about how these organisms relate to each other.
Comparative analysis
Figure 2 In order to identify relationships within a population you need to make comparisons between each sequence within the population of interest.

Comparative analysis involves determining the relationships between all sequences in the dataset of interest and can be undertaken by core genome analysis and/or a pan-genome analysis. Core genome analysis is based on genomic features that are shared by all sequences in the analysis group (Figure 3).

Figure 3 Sequences within an analysis are compared to each other and regions which are represented in all sequences are used to generate a core genome. Regions which are unique or not present in all sequences are not included in the core genome. These regions constitute the accessory genome.

Typically, the core genome is used as it is often assumed that this is what is inherited during an outbreak or transmission event. Core genome based analysis can include: phylogenetic analysis, where a phylogenetic tree is built and used to detect the most highly related sequences; core genome MLST (cgMLST) analysis and/or calculation of genomic distances (SNPs or alleles).

Core genome analyses can also be undertaken using a reference genome. The reference genome is assumed to be the hypothetical ancestor of all the sequences in the analysis. The choice of reference genome can dramatically impact the interpretations of a comparative analysis. A too distant reference can cause the size of the core genome to shrink, reducing the available region of each sequence for comparison.

There are also reference-free approaches to core-genome analyses, of which cgMLST is an example. These approaches have the benefit of not needing to make sure you have the correct reference genome, but also have their own challenges and may not be suitable for all pathogens.

Although the core genome is a useful way to identify and measure relationships within a population or dataset, it can be misleading if not well designed. If there are a lot of mobile elements (horizontal gene transfer via conjugation of plasmids 1.9) which may not be part of the core genome, interpretation of core genome analyses should be undertaken with care and may require additional methods to trace an outbreak or transmission network, as outlined in the future step 3.5.

For further discussion of the specific criteria that can impact these studies check out this useful publication.

© Wellcome Connecting Science
This article is from the free online

Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now