Skip main navigation

New lower prices! Get up to 50% off 1000s of courses. 

Explore courses

hAMRonization tool

PHA4GE rep Talk about hAMRonization of outputs

In this course, we have focused on the three main tools used to access the three main AMR databases. However, there are more than 18 open-source AMR detection tools currently available (more information is given in the link to resourses below)

Each of these tools have their own respective strengths and weaknesses due to differences in underlying databases, search algorithms, and default parameterisations. This means that no single tool is likely to be optimal for all possible AMR analyses you want to run. Determining which tool is most suited to a given use-case generally requires being able to compare (or potentially combine) the outputs of several tools run on your specific data. However, due to limited standardisation between databases and tools, each tool generates differently formatted outputs with inconsistent AMR gene names, terminology, and interpretive data. This means comparing AMR tool results is both challenging and labour intensive.

Standardising AMR Tool Outputs

The majority of AMR tools generate tab or comma-separated text files with each row representing a potential detected AMR determinant (such as an acquired gene or a variant). Each column or field contains data about those determinants like measures of sequence similarity relative to the database sequence (e.g., % identity or coverage), location within the input genome (e.g., a specific contig accession or reading frame identifier), and contextual data from the database (e.g., specific AMR determinant name, associated antimicrobial drug class). The tools can provide different column outputs and often name them differently (e.g., “Gene Symbol” vs “gene” vs “Best_Hit_ARO”). In addition, the data in these fields, the actual data in these fields can be inconsistent between tools depending on the underlying database and formatting choices. However, to begin tackling this deeper problem we first need to be able to compare these outputs using a single standardised format.

hAMRonization

The Public Health Alliance for Genomic Epidemiology or PHA4GE (a large international network of scientists trying to develop community standards for more effective use of genomic data in public health) compared and analysed the outputs of 18 open-source AMR gene detection tools. This was used to develop the hAMRonization specification, a common set of output fields each linked to standardised definitions (3). Individual output fields from each tool were then mapped to these standardised fields. For example, the identifier for the contig which contains a given AMR gene is listed under the ‘Contig’ field in RGI, ‘contig_name’ in ResFinder, and ‘contig id’ in AMRFinderPlus. Each of these contains the same information and so can be mapped to single common ‘Contig ID’ field in the hAMRonization specification (Figure 1). This work also identified a core set of essential fields needed for reproducible AMR analyses in health or research contexts such as tool version, database version, and input filename.

Figure 1: The PHA4GE hAMRonization specification schematic. On the left, there are 3 distinct fields from the outputs of AMRFinderPlus, ResFinder and RGI. As indicated by shared text colours, these non-standardised fields which contain the same information can be mapped to a single standardised field in the hAMRonization specification on the right.

This specification and the associated mappings were used to develop the hAMRonization tool. This command line tool lets you automatically convert the output files of the 18 supported tools to a common standardised report using the hAMRonization specification. When tools do not provide the full set of essential core contextual data, hAMRonization will prompt the user for this data to ensure reproducibility.

The hAMRonization tool also supports combining hAMRonised results from multiple tools and samples into a single report to aid comparison and analysis. These reports can optionally be generated as spreadsheets (CSV), JSON, or an interactive navigable HTML format. This means research and public health reference laboratories can easily compare results across tools and change their workflow to use a different tool without having to develop custom code for just that tool (e.g., (4–7)). It also means that the communication and interpretation of results for downstream knowledge users, such as infection control prevention clinicians, can be done in a consistent standardised manner regardless of which AMR prediction tool was used in the genomic analysis.

However, as mentioned above, while this common format and tooling supports comparison and standardisation efforts, it does not standardise the actual data in the output fields. For example, one tool may report detection of the Gene Symbol “TEM-1” and another “blaTEM-1”. This and other inconsistencies is a result of the differences in the underlying databases.

Figure 2: The hAMRonization tool automatically maps the outputs of any of 18 AMR determinant prediction tools listed above to a standardised format using the hAMRonization specification. This means a single unified report can be generated to facilitate tool interoperability, comparison of results, interpretation, and downstream reporting.

For further reading please find list of references attached below.

© Wellcome Connecting Science
This article is from the free online

Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now