Distribution plots for longer sequences

This step explains how to access BTK distribution plots showing reads in different chromosomes and regions.

The Tree of Life (TOL) programme at the Wellcome Sanger Institute uses the BTK pipeline and viewer as part of its genome assembly workflow.

Contaminants and cobionts (Cobiont: Any organism that is present alongside the target organism in a DNA sample) are identified and removed using BTK before the assembly is submitted to the public databases. In Week 3, you will see how you can also use BTK to understand, explore, and clean up a preliminary genome assembly.

You might wonder if there is any point in sharing a BTK plot for a cleaned up assembly. The answer is yes, because it clearly shows that the genome assembly is free of contaminants or other non-target organisms.

Here is an example of a high quality genome assembly of the moth species Gymnoscelis rufifasciata submitted to the public databases by TOL:

There is only one gc-coverage blob with one taxonomic assignment and the genome is assembled into 30 large chromosomes of size >6 megabasepair, along with about a dozen smaller sequences.

In all the previous examples, you had seen a whole sequence being labelled as high or low GC, or high or low sequencing coverage, or the whole sequence matching a non-target organism. In this screencast, you see how you can visualise the GC and coverage across the different parts of the length of a long chromosome.

In summary, BlobToolKit is very useful in visualising unusual GC, sequence coverage, or taxonomic hits in long contiguous chromosomal assemblies, just as it was helpful in understand unusual patterns in older, more fragmented assemblies.

Eukaryotic Genome Assembly: How to Use BlobToolKit for Quality Assessment

