Skip main navigation

Processing Resistome count data in ResistoXplorer – Part I

In this video Achal Dhariwal talks about rocessing Resistome count data in ResistoXplorer – Part I

Alright, now we have uploaded data into ResistoXplorer. Can we start generating some nice figures? Not yet! We first need to process the data that was uploaded, and this is done through data filtration and normalisation.

Resistome data can be affected by many sources of systematic variation arising from sample preparation to sequencing. The main aim of data filtration and normalisation is to remove or reduce such systematic variability. Firstly, we will discuss about data filtration, followed by normalisation.

Data Filtration: The objective of data filtering is to remove low-quality and/or uninformative features (ARGs) to improve downstream statistical analysis. For such purpose, ResistoXplorer provides three data-filtering options:

Minimal data filtering removes features (ARGs) with zero read count across all the samples or only present in one sample. Such features are considered as artefacts and should be removed from analysis due to biological and technical considerations.

Low-count filtering removes features (ARGs) that are present in a few samples with very low read counts. Such features cannot be discriminated from sequencing errors or low-level contamination, and it is difficult to interpret their significance with respect to the whole community. By default, ResistoXplorer offers an option to remove these features based on sample prevalence and their abundance levels (count). Additionally, these low abundant features can also be removed in ResistoXplorer by setting a minimum count cutoff based on their mean or median value.

Low-variance filtering removes features (ARGs) that remain constant in abundances across all the samples or across the experimental conditions. Such features (ARGs) are implausible to be informative in the comparative analysis. Filtering those uninformative features can increase the statistical power by reducing multiple testing issues during differential analysis. In ResistoXplorer, we can filter low variant features based on their inter-quantile ranges, standard deviations or coefficient of variations.

The low count and low variance filtering options are highly recommended for comparative analysis.

We want to hear from you!

If you have previous experience with data filtration, let us know some of the challenges you encountered. How did you solve them? Your experience can help other learners!
This article is from the free online

Exploring the Landscape of Antibiotic Resistance in Microbiomes

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education