Skip main navigation

Processing Resistome count data in ResistoXplorer – Part I

In this video Achal Dhariwal talks about rocessing Resistome count data in ResistoXplorer – Part I

Alright, now we have uploaded data into ResistoXplorer. Can we start generating some nice figures? Not yet! We first need to process the data that was uploaded, and this is done through data filtration and normalisation.

Resistome data can be affected by many sources of systematic variation arising from sample preparation to sequencing. The main aim of data filtration and normalisation is to remove or reduce such systematic variability. Firstly, we will discuss about data filtration, followed by normalisation.

Data Filtration: The objective of data filtering is to remove low-quality and/or uninformative features (ARGs) to improve downstream statistical analysis. For such purpose, ResistoXplorer provides three data-filtering options:

Minimal data filtering removes features (ARGs) with zero read count across all the samples or only present in one sample. Such features are considered as artefacts and should be removed from analysis due to biological and technical considerations.

Low-count filtering removes features (ARGs) that are present in a few samples with very low read counts. Such features cannot be discriminated from sequencing errors or low-level contamination, and it is difficult to interpret their significance with respect to the whole community. By default, ResistoXplorer offers an option to remove these features based on sample prevalence and their abundance levels (count). Additionally, these low abundant features can also be removed in ResistoXplorer by setting a minimum count cutoff based on their mean or median value.

Low-variance filtering removes features (ARGs) that remain constant in abundances across all the samples or across the experimental conditions. Such features (ARGs) are implausible to be informative in the comparative analysis. Filtering those uninformative features can increase the statistical power by reducing multiple testing issues during differential analysis. In ResistoXplorer, we can filter low variant features based on their inter-quantile ranges, standard deviations or coefficient of variations.

The low count and low variance filtering options are highly recommended for comparative analysis.

We want to hear from you!

If you have previous experience with data filtration, let us know some of the challenges you encountered. How did you solve them? Your experience can help other learners!
This article is from the free online

Exploring the Landscape of Antibiotic Resistance in Microbiomes

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now