Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only T&Cs apply

Find out more

BlobToolKit in the real world: why is there a whale in my bird genome?

In this article we're going to dive deeper into one of the examples of BlobToolKit that Mark Blaxter showed in his introduction
A tinamou, a small brown flightless bird, stepping through some leaves
© Wellcome Connecting Science

In this article we’re going to dive deeper into one of the examples of BlobToolKit that Mark Blaxter showed in his introduction.

Follow this link to see a BTK blob view (or “blobplot”) of the publicly available genome assembly of a bird Crypturellus cinnamomeus – also known as the thicket tinamou – Crypturellus cinnamomeus BTK blobplot

Here is an annotated screenshot of this plot (don’t worry about the long web address and all the viewer options and Filters etc in the viewer, we will be covering all of those in later steps). For now just concentrate on the main blob plot.

annotated screenshot of bird Crypturellus cinnamomeusClick to expand

Each sequence in the genome assembly is represented as a circle, and the size of the circle is the length of that sequence. Right away you see there are two clear blobs – a higher blue blob, and a lower blob coloured green and red. The two blobs have different GC content (X axis), and different coverage (Y axis), and the colours tell us the taxonomy of the best hits of those sequences in the public databases. The legend on the top right of the plot tells us that

  • The blue sequences are “Apterygiformes”, the order of birds that the Tinamou belongs to.
  • The green sequences are “Eucoccidiorida”, a taxonomic order consisting of apicomplexan parasites. Apicomplexans are a group of tiny eukaryotic parasites which may cause diseases. This phylum includes the parasite Plasmodium falciparum which causes Malaria.
  • The red sequences are the mammalian order “Cetacea” which includes whales.

In this case BTK was used to visualise the presence of a contaminant parasite in the archived genome assembly of a bird. Why is this important? Although scientists do their best to create high quality resources, the field of genome assembly is relatively new and so there are many public genome assemblies which have such contaminants present. A researcher using this genome might think that this is a “pure” bird genome file and might conclude that birds contain genes that are also seen only in some parasitic species, and that would lead to many incorrect findings.

And what about the red sequences that are labelled “Cetacea”? They are not actually from a cetacean/whale. They are from the same apicomplexa parasite as they are in the same blob. The reason for the weird label is that someone had previously submitted “whale” sequences to the public databases that were actually apicomplexa, so when we looked for the closest matching sequences, we found the same apicomplexa, but those sequences were labelled “whale” in the public databases. If the previous whale genome submitters had used BlobToolKit and removed the parasite sequences before submitting the genome, this problem would not have occurred.

BTK is important to not only detect problems and contaminants in pre-existing genome assemblies but it is also an invaluable tool when we create new genome assemblies for our species of interest. Using BTK we can identify problems, contaminants and symbionts in the resources we create, and filter them out or separate them so that they do not pollute the public databases with incorrect records.

In the next step you will see an actual example of what can go wrong if you don’t use BTK to examine a genome assembly before submitting it to the public databases.

© Wellcome Connecting Science
This article is from the free online

Eukaryotic Genome Assembly: How to Use BlobToolKit for Quality Assessment

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now