Skip main navigation

What are sequencing metrics

Article explaining how to interpret sequencing metrics
© COG-Train

Illumina Sequencing run metrics

Run performance for Illumina sequencing can be monitored using the Illumina native programmes: Sequencing Analysis Viewer (SAV) and BaseSpace. Next, we will discuss the key metrics.

Yield (Gb)

This shows the number of bases generated in the run (Figure 1). Different sequencers and runs generate different amounts of data over variable amounts of time.

Screenshot from BaseSpace showing metrics values. Detailed description in the main text

Click to enlarge

Figure 1 – Summary of sequencing metrics from BaseSpace (1. Yield (Gb); 2. % ≥ Q30; 3. % Aligned and % error rate; 4. Cluster Density (K/mm2); 5. %PF).

% ≥ Q30

This is the percentage of bases with a quality score of 30 or higher. It is evaluated using a Q Score distribution chart that will indicate the % of reads with Q Score ≥ 30

  • Q30 is the probability of incorrect base calling of 1 in 1000 (99.9% accuracy)
  • lllumina runs usually generate > 70-80% Q30 data
  • The Q30 reported is an average across the whole read length
  • The Q30 decreases towards the end of the read

Percent Aligned and Error Rate

Illumina recommends that all runs are spiked with their PhiX control; these metrics are only calculated if PhiX is spiked in.

  • The PhiX library provides a quality control for cluster generation, sequencing, and alignment.
  • % Aligned is the percentage of clusters in which the first 25 cycles align to the PhiX reference genome.
  • Error rate is the rate of mismatches between sequencing data and the PhiX reference genome (usually below 1%). If this was not used then %>=Q30 is your best tool to check base quality.

Cluster Density (K/mm2)

This shows the density of clusters on the flow cell and is an important metric to evaluate the quality of the data (Figure 1). When looking at these data you should also consider the Percentage Clusters Passing Filter (%PF), which is an internal quality filtering procedure. Together these metrics can help you determine if there are problems with your loading concentration.

Cluster density and %PF are inversely related: typically if there is a high cluster density the %PF will be lower and vice versa:

  • Lower %PF may lead to losing coverage on some samples within the run
  • If the samples are not inversely related (for example cluster density is within range but %PF is low) consider an instrument or reagent issue

High cluster density: overloading of the library may cause merging of clusters which increases the cluster density. This may lead to:

  • Poor template generation, which then causes a decrease in the percentage passing filter (%PF)
  • Low Q30 scores
  • Complete run failure

Low cluster density can be due to:

  • Underloading of the library
  • Library preparation issues (for example, poor denaturation, etc.)

Both high and low cluster densities will have a negative impact on data output.

Oxford Nanopore sequencing metrics

MinKnow provides real-time information about the sequencing run. It is important to consider flow cell health, pore occupancy and read length. If live base-calling is active, quality score plots and information of number reads per barcode can also be followed. A report is then generated at the end of the run.

Flow cell health and pore occupancy

The flow cell health graph provides a summary of the status of all the pores in a flow cell (Figure 2). A colour code is used to indicate different pore statuses (Figure 3).

Illustrative screenshot of MinKnow monitoring the flow cell health. Detailed information in the main text

Click to enlarge

Figure 2 – Sequencing overview showing flow cells currently sequencing on MinKnow software indicating the flow cell health

Colour-coded table indicating the flow cell health. Light green: Active pore sequencing; Dark green: Active pore; Dark blue: Pore is recovering and may become active again; Light blue: Inactive pore not available for sequencing; White: Unclassified

Click to enlarge

Figure 3 – Table showing colour-coding of different flow cell pore occupancy statuses.

Pore occupancy information is provided in real-time by the channel states panel (Figure 4). A high proportion of active sequencing channels is an indicator of good library preparation.

Screenshot of the channel states panel. It is a visual representation of the flow cell. It is composed of two blocks of hundreds of grouped squares. Each square is coloured according to the pore states (Figure 3)

Click to enlarge

Figure 4 – Channel states panel: real-time pore occupancy status of one sequencing run on MinKnow software

The pore activity graph summarises the channel state of a run over time (Figure 5). In a good run, the proportion of active pores decreases slowly as the run progresses.

Screenshot of pore activity panel. Bar graph representation of the proportion of pores sequencing, recovering, inactive or unclassified

Click to enlarge

Figure 5 – Pore activity: Graph summarising pore activity over time on MinKnow software.

Read length histograms and cumulative plots

Read length histograms provide cumulative information on the read length (Figure 6). This information is presented either as number of reads, or, estimated bases vs read length.

Screenshot of read length histogram. A histogram representation of the sequenced read lengths.

Click to enlarge

Figure 6 – Read length histogram: Graph showing estimated read length vs estimated bases on MinKnow software.

Cumulative output graphs show the total number of reads and if they have passed or failed the quality filter (Figure 7). If live base-calling is not active, only the total number of reads will be displayed.

Screenshot of cumulative output. A graph representation indicating the number of reads increasing exponentially through the time of the sequencing run.

Click to enlarge

Figure 7 – Cumulative output: Graph showing the total number of reads and whether they have passed or failed the quality filter on MinKnow software

If live base-calling is active, cumulative median or modal quality score plots of passed reads are available (Figure 8).

Screenshot of quality score panel. Graphical representation of the reads quality score.

Click to enlarge

Figure 8 – Quality score: Graph showing quality score over time on MinKnow software

Barcode stats

If multiple samples are being sequenced, the barcode read counts graph displays the number of reads that have been basecalled and have passed the quality filter for each barcode (Figure 9).

Screenshot of barcode hits. Graphic representation of barcode count.

Click to enlarge

Figure 9 – Barcode hits: Graph showing the read counts for each barcode on MinKnow software.

© COG-Train
This article is from the free online

A Practical Guide for SARS-CoV-2 Whole Genome Sequencing

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now