Who should install BlobToolKit?

This article explains the requirements for installing BlobToolKit on your own machine to work on de novo genome assemblies.
A quick reminder before starting this activity – you do not need to do the steps in this activity to finish this course. It will not be assessed. This part is optional, and only for those who want to run BTK on their own assembly.

Now that you know how to use the BTK web viewer, you might also want to use BTK on a genome assembly that you have generated, or that is not yet in the public databases.

Who this activity is for: If you are a bioinformatician creating a de novo genome assembly from raw reads, you will almost certainly want to run the BTK pipeline steps, as they will help you detect the presence of other cobionts such as contaminants or parasites or symbionts.

Who this activity is NOT for: If you are planning to use a publicly available genome assembly in other research, for example, to study the evolution of genes, or to study the population genetics of a species for conservation purposes, then you should use the BTK web viewer where the BTK pipeline has already been run. If you find a public genome assembly without a BTK view, please contact who will run the pipeline on a priority basis for that assembly.

Difference between the BTK viewer and the BTK pipeline

As we mentioned in “Section 2.4 Where did the BTK data come from”, the BTK viewer is a web based user interface for viewing the contents of a precomputed “blobdir”, ie a computer folder with several files that are necessary to visualise the genome assembly and its attributes such as GC, coverage, length, taxonomic hits, etc, in a web browser. This blobdir is created by the BTK pipeline, which is a set of python scripts and other bioinformatics tools that can be run automatically on a genome assembly and a set of reads.


If you want to use the BTK pipeline to create a blobdir for your own assembly, it is a fairly computationally intensive process and requires substantial computing resources. The requirements for running the full pipeline are:

Computer/Server requirements:

  • A linux / unix based machine with at least 16 CPUs and at least 60 GB of RAM
  • Enough hard disk space to store the blast databases (typically ~350 GB), and to store your assemblies and reads and read mappings (typically ~70-100 GB for large genomes)

User requirements:

  • Comfortable with the unix command line
