Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Alternatives to BlobToolKit

Table providing alternative tools

BTK is not the only tool you can use for genome assembly and QA.

There are a number of tools designed to separate multi-taxa genomic data. Some are specifically aimed at finding contaminants in your sample, whereas others aim to separate metagenomic data that has been intentionally sampled from many taxa at once. Which of these tools you use depends on your research goals and types of data.

One important difference is that some tools analyze your sequenced reads, whereas others are designed for completed assemblies. There are advantages to both of these approaches. If you are simply trying to assemble the genome of a single organism, it might be computationally more efficient to remove contaminating reads at the beginning. This also helps avoid producing scaffolds that might incorporate portions of the contaminating DNA. However, as you will see in this tutorial, removing reads early also means that you cannot further examine co-sequenced taxa, and these could be biologically important.

Some of the available tools for examining contamination include:

blast

blast is probably the best known tool for identifying DNA and protein sequences, and many tools utilize blast in their workflow. blast can be used to identify single gene sequences, or to identify all of the predicted genes in a genome. However, one limitation of this approach is that it can be a very time-consuming process.

Specific tools within the blast suite can be used to identify and remove unwanted sequence. More here: https://www.ncbi.nlm.nih.gov/tools/vecscreen/contam/

Conterminator

Conterminator seeks to identify mis-identified sequences in large databases, using an all-vs-all comparison. This is very useful to find contaminants in a reference database before you use it to screen your own samples. More about conterminator here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02023-1

Eukrep

Designed for metagnomic data, EukRep is a nice tool for separating reads between taxa, and especially for separating eukaryotic and prokaryotic species https://github.com/patrickwest/EukRep

FastQ Screen

This tool is useful to remove a known contaminant, like human or phiX from a set of reads in fastq format More here: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/

Kraken

kraken is a fast tool, designed to separate reads prior to assembly. It can be used to remove potential contaminants prior to assembly, or to isolate and further analyse these reads. Custom databases can be constructed to remove specific taxa, or larger searches can be used to find unknown contaminants. More here: https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46

Sourmash

Sourmash uses an alignment-free, kmer-based approach to identify potential contamination in assembled genomes. Designed for very large datasets, this is good for identifying systematic errors across many assemblies. https://sourmash.readthedocs.io/en/latest/

Tiara

Like sourmash, Tiara uses an alignment-free method based on matching kmer to separate sequences in metagenomic data. Its focus is on eukaryotic genomes. https://academic.oup.com/bioinformatics/article/38/2/344/6375939

Tool Type of data
blast Reads or assemblies
Conterminator Assembled genomes
EukRep Raw reads
FastQ Screen Raw reads
Kraken Raw reads
Sourmash Assembled genomes
Tiara Assembled genomes

Table: Alternative tools and type of data that can be used

Do you use any of these tools? Are there others that you have found helpful for similar situations? Let us know in the comments.

© Wellcome Connecting Science
This article is from the free online

Eukaryotic Genome Assembly: How to Use BlobToolKit for Quality Assessment

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now