Skip main navigation

Differential abundance testing

Differential abundance testing

One of the main aims of the metagenomic-based resistome studies is to identify significant differences in resistance gene abundances between samples. Although it sounds straightforward, several technical challenges remain, most notably due to the characteristics of metagenomic count data. For example, this data is high dimensional, contains many zeros (sparse), variance in distributions, uneven sequencing depths, and compositional nature.

Currently, the development of statistical models that account for such features of metagenomic data or methods to transform data to have distributions that fit standard test assumptions is generally recommended. In the recent past, many novel statistical methods have been developed and available in the R programming language to handle the statistics of metagenomic data. For example, the metagenomeSeq R package implements a novel normalization (Cumulative sum scaling) and statistical model based on the zero-inflated Gaussian (ZIG) distribution to deal with zeros inflation or undersampling-related bias. While, algorithms developed for RNA-seq data such as edgeR and DESeq are also used very often and seem to outperform other methods used for metagenomic data [1, 2]. They fit a generalized linear model and assume that read counts follow a negative binomial distribution to account for the features of count data. However, these methods do not explicitly account for the compositional nature of whole metagenomic sequencing data.

Recently, promising Compositional Data Analysis (CoDA) approaches have been proposed. These approaches perform statistical testing on the log ratios of features (genes) rather than their actual count abundances to deal with the compositionality. For example, ALDEx2 performs parametric or non-parametric statistical tests on log-ratio values from a modeled probability distribution of the data. It returns the expected values of statistical tests along with effect size estimates. In contrast, ANCOM tests the log-ratio abundance of all pairs of features (genes) for differences in means using non-parametric statistical tests.

We know there are a wide variety of methods available to detect differentially abundant genes. However, the main question remains: Which method is the best or which one should choose among different differential analysis methods?

There is no single method that is suitable for all types of metagenomic datasets and questions. In addition, the data characteristics of a given study, such as sample or group size, sequencing depth, effect sizes, and genes abundances, significantly affect each method’s outcome and performance. Therefore, various methods are required for different metagenomic datasets and research questions to be addressed [2, 3, 4].

Recent large-scale benchmarking comparative studies are often constructive and can serve as a guide and should be read and carefully analyzed to decide which method to choose for our own data. Additionally, one should choose a consensus approach based on multiple differential abundance analysis methods to help ensure robust biological interpretations.

As a result, we have implemented multiple standards, such as DESeq2 [5], edgeR [6], metagenomeSeq [7], LefSe [8], as well as more recent CoDA-based univariate analysis approaches such as ALDEx2 [9] and ANCOM [10] in ResistoXplorer. However, it should be noted that each of these approaches will use its specific normalization procedure. For instance, The LefSe algorithm employs the standard non-parametric tests for statistical significance coupled with linear discriminant analysis on the total sum scaled (TSS) normalized data in order to assess the effect size of differentially abundant genes. While the relative log expression (RLE) normalization is used for DESeq2, and the centered log-ratio transformation (CLR) is applied for ALDEx2.

You can click here to know more about each method in detail.

OK, now let us look at the video to learn how to perform such analysis in ResistoXplorer.


Pereira M.B., Wallroth M., Jonsson V., Kristiansson E. Comparison of normalization methods for the analysis of metagenomic gene abundance data. BMC Genomics. 2018; 19:274.

Jonsson V., Österlund T., Nerman O., Kristiansson E. Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics. BMC Genomics. 2016; 17:78.

Bengtsson-Palme J., Larsson D.J., Kristiansson E. Using metagenomics to investigate human and environmental resistomes. J. Antimicrob. Chemother. 2017; 72:2690–2703.

Pérez-Cobas A.E., Gomez-Valero L., Buchrieser C. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microbial Genomics. 2020; 6:mgen000409.

Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550.

Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26:139–140.

Paulson J.N., Pop M., Bravo H.C. metagenomeSeq: statistical analysis for sparse high-throughput sequencing. Bioconductor Package. 2013; 1:91.

Segata N., Izard J., Waldron L., Gevers D., Miropolsky L., Garrett W.S., Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011; 12:R60.

Fernandes A.D., Reid J.N., Macklaim J.M., McMurrough T.A., Edgell D.R., Gloor G.B. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. 2014; 2:15.

Mandal S., Van Treuren W., White R.A., Eggesbø M., Knight R., Peddada S.D.disease Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial Ecol. Health. 2015; 26:27663.

This article is from the free online

Exploring the Landscape of Antibiotic Resistance in Microbiomes

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now