Skip main navigation

Normalization and differential analysis

description added later

Now we have annotated the ARGs present in the samples – great! This data can be then analyzed for abundance, diversity, and used to draw comparison across samples. So far, as we have seen in the previous steps, we have tried to working with standardized methods to minimize the differences between samples in terms of design, collection, extraction, and analysis. However, regardless of how well these steps were performed there will always be inherent differences across samples. Thus, to be able to draw comparisons and parallels, an important step of normalization is necessary.

Imagine that you are comparing two samples that were treated similarly during the whole project. For whatever reason, the quality of reads in one of them was much lower and those reads needed to be removed from the analysis. As a result, you have one samples with 1 000 000 high quality reads and another one with 400 000. Comparing them directly would be extremely challenging and prone to bias. So, we need to attempt to make them comparable by normalization to a certain standard between. In this regard, one method for such that is very often used is in fact normalization to the total number of reads. This ensures that there is a common standard and samples with more reads will not take over the analysis.

There are alternatives that instead of using the total number of reads, they use a specific gene or locus in the bacterial genome. This is the case for housekeeping essential genes such as gyrB, rpoB, and others. The 16S rRNA is also very often used here. However, caution is advised as we know that the number of 16S rRNA copies varies significantly between different bacterial species, thus, such aspect needs to be incorporated into the normalization step.

Once data is normalized, differential analysis is the next step. There are a multitude of softwares available for such, and which one to choose will depend on the data. Two very often and reliable resources are DESeq2 and edgeR. You can find the original publications for each in the links below:

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Following these steps will provide you with a count table that will be utilized for downstream analyses.


Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. PMID: 25516281; PMCID: PMC4302049.

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11. PMID: 19910308; PMCID: PMC2796818.

This article is from the free online

Exploring the Landscape of Antibiotic Resistance in Microbiomes

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now