How are AMR genes detected?
We have just covered a range of databases that contain bacterial AMR mechanisms and you have heard about their strengths and weaknesses. In this section we are going to cover how you might go about accessing these catalogues and leverage their contents using existing tools.
We now know how our genotype information is stored in AMR databases, which allows us to correlate those genotypes with predicted phenotypic correlations. However, it would be very tedious to have to do this ourselves, going gene by gene in our genome, looking up the database and seeing if it is linked to AMR. Additionally, the genes in our data may not be a 100% identical match to those in the database. Maybe they share 99% sequence similarity but not every single nucleotide is the same. Does that mean we count it as a match or not?
Thankfully, we don’t have to do this matching ourselves. Many AMR detection tools can undertake this matching for us, indicating the genotypic signatures of AMR we have in our sample(s). At the core of these tools are computer algorithms that analyse a set of genes and search the AMR database for those that share sequence similarities. Similarly, mutations in genes can be identified by comparing our input data against known AMR-associated mutations in the database.
The tools are covered in detail in the next sections but the underlying algorithms are outlined below.
Note: the below sections assume you know a little about genome sequencing, what FastQ files are, and the basics of genome assembly. If you need a refresher on this, you will find links to resources at the bottom of the Glossary with links to refresher pre-reading material step.
Sequence similarity searching is a huge area of bioinformatics research. Below is (briefly) detailed the 4 main ways that AMR detection tools may undertake this task:
- Local alignment search tools
- K-mer counting
- Hidden Markov Models
- Read mapping
Not every tool uses every approach and each approach (and thus every tool) has its advantages and drawbacks. The tools are covered in detail in the next sections but keep reading to learn more about these underlying algorithms.
Alignment searching
The most straightforward approach to finding a sequence match is to take the input sequence and search a database for the closest matching sequence. This is akin to a game of ‘is this your card’ where someone may think of a card (e.g. the ace of spades) and someone else searches a deck of cards to find the one they are thinking of. However, in that scenario, there are 52 possible matches. In modern AMR databases there are thousands, maybe millions, of matches. Brute force searching would simply take too long.
In the 1990s an algorithm called BLAST (Basic Local Alignment Search Tool) was invented. This algorithm splits every sequence in the database into smaller stretches of basepairs (called k-mers) and indexes them for fast searching. It does the same with the input sequence and then it matches certain input k-mers with the database k-mers and uses them as anchors to find the closest match. It then returns the closest one along with some statistics such as sequence similarity percentage and an e-value, essentially how likely it is these are an actual match and not just random chance. For a detailed understanding of BLAST, see the following tutorial slides slides on this tutorial or for all the details, watch this set of 11 videos.
BLAST revolutionised bioinformatics when it was invented and now sits at the core of many sequence searching tools, including the entirety of the NCBI database. Many AMR detection tools use this algorithm, or others like it, to find sequence similarity matches in the AMR databases and then, based on percentage identity cut-offs that are specific to each AMR-related gene, report back to the user if they find a match or not.
K-mer counting
In a similar vein, this k-mer approach can also be used for counting matches to find similarities. In essence, the input sequence and database sequences are split into k-mers and the occurrence of the input km-mers in each database sequence are counted. The database sequence with the most counts is the top score and if this is over a threshold, it is a match. You can read the full outline of this method in the KMC paper, which is the primary program used by AMR tools for this approach.
Hidden Markov Models
Hidden Markov Models (HMMs) are a type of machine learning that can analyse input data, such as gene sequences, and predict outcomes—specifically, whether there is a sequence match. For sequence similarity searching, every gene in the AMR database is converted into a HMM profile. The program HMMER or others like it then search these profiles with the input sequence and see which one is predicted to be the best match. If you wnat to know the inner workings of HMMER, I suggest you watch this in-depth overview video.
Similar to the BLAST approach described above, the HMM method provides a score indicating how well the sequence matches the HMM profiles of genes in the AMR database. AMR detection tools then apply a score cutoff to determine whether the match is sufficiently close to conclude that a specific gene or mutation is present in the input dataset.
Alignment-based AMR detection
The approaches described above are the primary methods used for sequence alignment when searching databases. One tool, AMRFinderPlus, combines both approaches in a complex set of decision trees (remember these from last week?) to state whether a resistance gene is present in the input or not. The full decision tree for this tool is outlined in Figure 1 below. But don’t worry, you dont have to remember all of this or interact with it. You just need to know what kind of approach the AMR detection tool is using when you select it for you own work.
Figure 1. How AMRfinderPlus finds sequence similarity matches in an AMR database. Reproduced from https://www.nature.com/articles/s41598-021-91456-0/figures/1
Read mapping approach
Another way to detect if a gene or mutation is present in your input data is using read mapping. This approach is much less common than the above alignment-based approaches for AMR detection and is mostly (currently) confined to tools designed for M. tuberculosis or detecting AMR from metagenomic datasets.
Instead of using assembled genomes as input, these methods use sequencing reads from such as Illumina or Nanopore machines (or similar technologies). These reads are then mapped to the sequences in the AMR database, like a jigsaw puzzle to a guide image on the box. If sufficient reads cover the gene (or mutation) then that gene/mutation is detected. This is called ‘reference-based’ mapping as shown in figure 2.
Figure 2. Assembly and mapping approaches from raw sequencing reads. For read-based mapping, reads are compared to a reference sequence and if enough reads cover the sequence, the consensus is produced and the template gene is labelled as present in the input sample. For comparison, de novo assembly uses no reference to compare against and tries just to compare reads to each other and assemble a consensus this way. Such assembled consensus sequences are often the input to the alignment-based approaches outlined above.
Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free