Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

Find out more

CQPweb: Searching for words (part 2)

Watch Andrew Hardie elaborate yet further on how to conduct increasingly sophisticated word searches.
We’re still on the topic of the kinds of additional queries that we can do using this query box. But now we’re going to start looking at how we can search for different spellings of a particular word. Now in the First Folio Plus corpus, all of the spelling is very nicely curated. The old fashioned 17th century or late 16th century spellings are still there, behind the scenes, but we’re able to search it using standard modern spellings. It’s all very nicely curated. But the Shakespeare project included the creation of a corpus where the spelling is a little less well behaved.
And this is what we call the EEBO TCP segment, which is a selection of books from EEBO, which is a very large online collection. And this is a very, very large corpus. It’s 100 times as big as the Shakespeare corpus and it’s books and other written published documents from around the period of Shakespeare. The example here that I’m going to take is the word lock. Let’s search for the word lock.
Big corpus queries take a little longer. We’ve got 1,000– nearly 2000 examples from 300 million words, which is the size of this corpus. It’s 300 times as big as the Shakespeare corpus. That’s nice, isn’t it? But hang on. We know that in the 1600s, spelling was not yet standardised. So there are various ways in which the word could vary. Let’s take a look at another possibility. Locke with E on the end. We know that typesetters would often add or leave off E’s at the ends of words in the early modern period basically to get the line to the right.
If we’re in locke, then all of the examples of locke with an E will have been linked to the standard spelling lock without an E. So let’s see. No. We’ve got 32 examples where the regularisation, the linking of non-standard spellings to standard spellings, just hasn’t worked. The reason it hasn’t worked is simply because it was done by computer, whereas the ones in the Shakespeare corpus were done by hand and that’s why they’re better. So we’ve got a problem then. What about all the things that might end upon the end of lock? Well to do this, we start using what’s called a wildcard search.
And a wildcard search uses a special symbol to indicate something that can vary in the search term. We’re going to use the star. The star means anything. So if we search for lock star, we will find the string L-O-C-K. We’ll find that word, but we’ll also find L-O-C-K-E and then anything else that might be added onto the end of the word lock. So let’s take a look. Here we are. We indeed have plenty of examples of lock. Do we have examples of locke with an E? Yes. There’s one. Lock with an E there. But we’ve also got lots of other things as well as, of course you can see. We’ve got lots of locks, sometimes with an extra E.
We’ve got locked spell the right way– the standard way or the modern way. We got locks spelled not the modern way. Normally, we would want to have all of the different spellings of the word lock, but none of these different things, if we were trying to analyse the word lock. Just that word, not the head word. Just the word itself. We would want to get lock and locke with an E and then other variant spellings, but not the other forms locks and locked. So let’s have another go in query. How can we do this? Well, what we can do is we can use an “or” query.
If you put something inside square brackets in a CQPweb query, then it interprets as find me this or this. And the two things need to be separated by a comma. What do we want to have at the end of our L-O-C-K? Well we want to have either nothing or an E. So that says find me L-O-C-K with either nothing or and E after it. And here we go. And there we are. It’s getting the two different variant spellings. Just what we wanted.
And you can see here that by using a non-fixed search, it allows us to start to grapple with some of the spelling issues that would otherwise get in the way of us finding all the examples that we’re interested in. That’s it for now. Thanks very much.

This talk is a continuation of the previous one. Again, Andrew Hardie hones your skills in Word searching, but this time he is dealing with what is probably the greatest impediment to searching for a word (if, for example, one wants to try and antedate it), namely, spelling variation.

As usual, put any issues or concerns or simply interesting observations in the comments.

We strongly advise you to listen to Andrew Hardie’s talk in one window of your computer, and open up his program, CQPweb, in another, so that you can practice what he is saying as he goes along. Obviously, you will need to pause his talk periodically.

This article is from the free online

Shakespeare's Language: Revealing Meanings and Exploring Myths

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now