First steps in cleaning a genome assembly

© Wellcome Connecting Science

First Slimane identified the three main blobs on the plot.

The central blob is light blue – and the legend says the contigs in this blob have blast hits to sequences in the fungal phylum Ascomycota. Botrytis cinerea is also in the phylum Ascomycota, so this is the target organism and the main blob of interest.

The green blob on top is labelled Proteobacteria and is very high coverage – the histogram on the right shows the peak coverage/sequencing depth as being close to 7k-8k, whereas the main blue Ascomycota blob is centred around 90-100, almost two orders of magnitude difference.

At the bottom right is a dark blue blob (legend: no-hit), i.e. those contigs/sequences were all very low coverage (< 10) and higher GC than the main Ascomycota blob (>0.45)

Based on this plot, one simple way to isolate the Botrytis cinerea contigs would be to remove all sequences with coverage > 2000, and all sequences with coverage < 12.

Figure 2 shows how to do this by going to the “Filters” tab and selecting the B_cinera_112_1.BCREADS_cov filter to a max of 2000 and a minimum of 12, we can remove all contigs with ultra high coverage and with low coverage.

Fig 2 - http://localhost:8080/view/all/dataset/1st_asm/blob?plotShape=circle&ASSEMBLY_NAME.DRR008460_cov--Max=2000&ASSEMBLY_NAME.DRR008460_cov--Min=12.0#Filters

Figure 2

Although this is not perfect it is far more informative than the original plot as we have ‘zoomed in’ on the main blob and removed the most obvious contaminants. In the next step, we’ll see how to analyse this main blob in a bit more detail.

© Wellcome Connecting Science
