Skip main navigation

CQPweb: Calculating keywords

Watch Andrew Hardie continue to explain how to compare different parts of a corpus, focusing on calculating keywords.
0
So having created our subcorpora we can now go on to do the keywords analysis. And for that we pick the keywords option here on this menu and we get this set of controls. Now, the ones that we really need to look at are these ones up here. These here we can basically ignore. They are the more complex options. The defaults are almost always fine. So what we do first is here where it says Select frequency list 1, we choose one of the subcorpora that we have created. So I’m going to choose just tragedies to begin with, we’ll do the tragedies.
36.8
And the default over here for what we’re going to compare it to, that is frequency list 2, is to compare whatever we’ve selected here to the rest of this corpus. So we’ll stick with that default. We go down to the Calculate keyword buttons here. Press it. And before very long has passed, we have the key words for the tragedies as compared to the rest of the corpus which is the comedies and the histories. In this display, positive keywords are shown in blue cells like Brutus, Anthony, Caesar, Troilus. And negative key words, that is, words that are less prominent in the tragedies than they are in the rest of the corpus, are shown on a grey background.
79.2
So John is the most noteworthy negative keyword. The pluses and minuses in this column here show us the same thing. So we have our positive and our negative keywords lots of them, as you can see, are proper nouns of characters. This is unsurprising. When we do keyword analysis on works of literature, character names do tend to come up. But one thing that you will see is that English France, England somewhere– yes England, and a few others are negative keywords, i.e. less common in tragedies. Whereas words to do with Rome and Roman are more common in tragedies, giving us a hint of setting and that the Roman setting, the classical world, is more common in the tragedies.
124.5
We can also view this as a word cloud by using this dropdown and Switch to graphical word cloud. And when we’re in graphical word cloud mode we see just the positive keywords, not the negative ones, laid out in this style, which you may have seen elsewhere, where the size that the word is rendered in represents how important it is as a keyword. There’s not much range in size here so most of these are comparable in importance. These are click through-able. So if I click on emperor, for instance, it will take me through and we will see that the examples come from several plays. OK. If I go back, New keyword calculation. Let’s now have a look at Just King Lear.
169.7
So just one play compared to the corpus. And what I’ll illustrate here is that in this Display as control here we can opt to go to the word cloud first. So let’s try that. So here are some strong positive key words for Lear. Father, daughters, sisters, some characters, some places, storm. If you know the plot of King Lear you will not be surprised by this. But we can also switch to the New keyword calculation here and if we want we can again see the King Lear as a table. OK. Let’s go New keyword calculation and let’s do our very last one, which is the women.
215.3
And we’re going to compare women and men so for frequency list 2, we don’t leave it as the default. We change that to The men. And let’s go down here, calculate keywords. So here we are. Again, character names, we’re always going to get those. It’s the nature of a literary corpus. But you can see that there are some pronouns that are quite prominent in the women character speech– you, my, I. But we and our are more common outside of the female speech, in the male speech. Now this is going to be pragmatically interesting, and we might want to click through. The numbers here allow us to click through for the subcorpus of interest. So that is the women.
260.9
So here’s We by The women. But then we can also go back if we want We by The men. Then that’s here in this set of numbers for the comparison data and there is We for the men. Both are very common, of course. OK. I hope that this has illustrated the basics of using the keyword tool and the kinds of things that you can expect to see when you see keywords and when you click through to the concordances from keywords. Thank you very much.

We strongly advise you to listen to Andrew Hardie’s talk in one window of your computer, and open up his program, CQPweb, in another, so that you can practice what he is saying as he goes along. Obviously, you will need to pause his talk periodically.

In this talk, Andrew Hardie continues to explain how you can use CQPweb to compare different parts of a corpus (e.g. male characters versus female). Having created “subcorpora” containing the parts of the corpus you are interested in, now we are in a position to do the actual comparison (i.e. a keywords analysis). You will discover how to do this, and also how to display the results (one possibility is as a wordcloud).

As usual, put any issues or concerns or simply interesting observations in the comments.

This article is from the free online

Shakespeare's Language: Revealing Meanings and Exploring Myths

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education