Skip main navigation

Input types and output reports

input type explained
assemblies vs gff vs reads - an example of input/run
Outputs from db explained

Let’s look at the input data types the databases and tools described so far require, as well as at the output reports they generate

Input data

Many of the databases and tools described here accept fasta files as inputs – typically generated automatically by de novo assembly and are geared towards identification of whole genes and alleles. Some options for generating assemblies from short-read sequences include shovill, spades and skesa.

However, NCBI AMRFinder has a growing catalog of species-specific point mutations. Given the nature of the input type (assemblies) — where recovering point mutations can be challenging, particularly in organisms such as Mycobacterium tuberculosis or Neisseria gonorrhoeae, where mutations in relevant genes may occur at differing allele frequencies, these tools may not be appropriate.

Output reports

Outputs of AMR gene detection tools can be complex and it is important to understand how to interpret them.

All of the tools described here will provide:

  • Gene and/or protein accession
  • Percentage % Identity of the match detected
  • Percentage % Coverage of the match detected (proportion of the gene match reported ie 75% of the blaOXA-1 gene was present in the sequence).
  • Drug class for which the gene reported is linked to.

When interpreting the results of your analysis you should first consider the % identity and % coverage of the match. Where you have low % coverage of a gene, you can use tools such as bandage to identify possible causes for the incomplete recovery.

Observation Potential causes Follow up
Low % coverage match (50% – 90%) Genes did not assemble contiguously. This is an artefact of sequencing technology – particularly when using short-reads Try an alternative assembler. Some assemblers deal with sequencing artefacts differently. You may find the ‘rest’ of the gene on another contig. However, without the intervening sequence – you can not directly infer gene presence.
  Gene truncation can occur and may lead to loss or function of a gene. If the gene fragment is not at the end of a contig it can indicate that the gene was truncated in the sample. Functional impact is unclear.
  Insertion sequences can be observed that interrupt a gene. These can be biologically relevant and also impact function of the gene product. A gene is present on a single contig, but another gene is present within the contig, breaking the AMR gene. Functional impact is unclear.
Imperfect % identity (<90% identity) SNP can be introduced that may change the gene sequence. These changes can be synonymous or lead to amino acid changes, premature stop codons etc CARD provides information regarding the alignment of the query sequence to the gene reported, which can be useful to determine what the potential impact may to function.
© Wellcome Connecting Science
This article is from the free online

Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now