Skip main navigation

More on GC content and how it is calculated in ‘windows’

In this short video, we will review the concept of GC content and calculation using windows. We will also show a short demonstration in Artemis.
Hello everyone. Before we continue with the contents of the course, we’re going to have a short recap on GC content. You’ve already read a little bit about this. We’re going to now do a graphical representation of how GC content can help us understand more about genomes. So as we discussed earlier, you’ve read before, GC content can be calculated as the number of guanines and cytosines present in the genome. This is a percentage. So, in order to calculate it, we will count how many Gs and Cs we have. We’ll divide it by the total number of nucleotides present in the genome. This is going to be of value below 1. In my case, for example, we’re going to put 0.4.
We tend to represent this as a percentage. So in this case, it will be 40%. So, in my example genome, the percentage of GC is 40%. Now, how can we represent this graphically? Let’s imagine a piece of DNA, or a genome. We’re going to draw it here on the x-axis.
And on the y-axis, I am going to represent the GC content.
And following on the previous example, my experimental bug has 40% GC, which I’m going to draw here. Now having only one value for the whole genome doesn’t give me much information, because we know that a lot of the genomes have variation in different parts of the genome, regarding GC content. So, how can we find out about these variations, and perhaps relate them to different functions of the genome? We can split the genome into ‘windows’, or sections. For example, into four sections, which I’m going to number 1 through 4. And now, I can calculate the GC content for each of these sections. Say, for example, for the first one, it’s about 20%. Then, for the second one, it’s about 50%.
And so on and so forth, and then we will see that each window has a different percentage of GC. Now, this is a little bit more informative. But of course, I can see that windows 1 and 3 have a lower GC content than the average. And windows 2 and 4 have a higher GC content. We can also join these values by a plot. And this is the way we interpret variation in that GC content. Now we’re going to turn to Artemis, where we are going to be able to load this graph of GC content, and this is going to help us understand more about different regions of the genome.
We have just learned how to calculate GC content, and the importance of the windows in the genome for the calculation of this factor. I have here a DNA sequence. This is the file that is familiar to you already as the St.dna. And before I proceed to show you the GC content graph, I am going to point out, too, an overview window which is very important, because it has a lot of information about the whole genome that we are representing here in this window. For that, we go to View, and Overview.
Or you can use the shortcut CTRL-O and in here we can see the number of bases for the genome, and also very importantly, the topic of our video, we can see the GC content. This is the average GC content for the whole genome. This is important. We’ll come to this in a bit. The value is 52.09. I am now going to open the graph of the GC content. That is in the Graph tab. GC content. The graph opens and its located at the top panel of the Artemis window. One of the things we see immediately is this horizontal line which has a value of 52.09. This is the same value as the overall GC content.
So this line represents the average GC content for the whole genome. This spiky graph represents the GC content for each window. In this case, the Window Size is 120. And we can see here in the top left, we can change that Window Size if we want to, make it a little bit larger. For example, by right clicking on the Window size, and then selecting Set the Window Size. We can change that value to say, for example, 500. And we see that the graph gets much more small. And these is because, with a larger window, this smooths up the variation in the GC content.
Another way of changing the Window Size is by using the right scrolling bar on the right hand side of the screen, and position here. And I can move it up, and then it becomes more spiky. And that correlates with the fact that I making the Window Size smaller. So, small, that perhaps they are not very useful, as I can only see variation, Aad I can’t see any blocks. Variations in the GC content can imply certain characteristics that come with these bacterial genomes. For example, in this area here, I’m starting to see a massive drop in the GC content. So, I’m going to investigate this a little bit further.
I’m going to double click in this region to put it in the centre of my screen. And I can see that indeed, this particular region has a very low GC content compared to the rest. It is typical that invading DNA, for example, that of bacteriophages, would have a different GC content from the rest of the genome. So, this could potentially be an invading DNA sequence from another organism. Another one is seen here, very low. And we also have areas where the GC content is higher than the average. And that is typically associated with coding regions.
In this video, we learn how to calculate the GC content, why calculating it using ‘windows’ is important, and also how to display these graphs in Artemis. My name is Dr. Anna Protasio. I hope you enjoyed this video, and please leave your comments, suggestions, and questions in the comments area.

In this short video, we will review the concept of GC content and calculation using windows. We will also show a short demonstration in Artemis.

We can calculate the percentage of Gs and Cs present in a genome. This will be calculated as the number of Gs + number of Cs divided by the total number of bases.

When calculating the GC content for a whole genome, this value takes only one figure and is a characteristic of some genomes. We say that some genomes are GC rich (that means high GC content) while others are GC poor (with low GC content).

However, the GC content is not the same along the whole genome. In fact, we observe that there are some regions where there seems to be higher concentration of Gs and Cs. We can systematically calculate the GC content in blocks of the genome that we call windows (as shown in the first part of the current video). In this process, a whole genome is divided into pieces and the GC content is calculated for each piece. We can then plot the changes in the GC content as the values for each of the blocks.

We can change the length of the blocks to have more or less resolution in the GC content.

This article is from the free online

Bacterial Genomes II: Accessing and Analysing Microbial Genome Data Using Artemis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now