Calculating GC for regions of the genome

In this Step, we will learn that GC content changes along a genome. We will also learn to display GC content in Artemis and what genome features can be associated with fluctuations in GC content.

Similarly to our calculations on GC content for a whole genome, we can calculate GC content for regions of the genome using “windows” of nucleotide sequence. In this case, a window is a defined (and often arbitrary) length of sequence. For example, it is possible to divide a 10 kb genome (10,000 nucleotides) into 10 windows of 1 kb (1,000 nucleotides) each or in 100 windows of 100 nucleotides each. It is then possible to calculate the GC content for each window and plot it in a graph.

A three panel graphical representation of GC content calculated for three different window sizes, with the smaller window size at the top producing a spikier profile and the larger window size at the bottom producing a smoother plot

Notice how the GC content of each window is slightly different and the graph gets more “spiky” with smaller windows and smoother with larger windows.

Artemis can calculate GC content of different windows and this feature will help us find interesting regions of a bacterial genome that are characterised by relatively low or high GC content.

