CQPweb: Finding non-English words

Watch Andrew Hardie explain how you can search for "foreign" words in Shakespearean corpora.
Welcome to CQPweb again. And in this video, I’m going to be showing you how we can use the tools of CQPweb to search for foreign words in the corpus of Shakespeare. So let’s take an example, and we’re going to take our example from French. Some of you may know, if you have done any French, the French word for has is a. He has il a, she has el a. If we were to find this we’ve got a bit of a problem because this is the same as an English word. The English word a, which is an article. If we just searched for a, we would get all of both.
And the French words, as this is obviously an English corpus, the French would be drowned out. So we need to reach into some of the annotation and use the annotation to find just the foreign words Now, what we can do is we’re not using the squiggly brackets here, the braces. We’re using a part of speech label without squiggly brackets because this is not a major word class. It’s a very specific word class. It’s FW for foreign words. So what we’re asking here is find me all of the words like that that are tagged FW. Let’s go. And there we are. And yes, these are indeed all the French examples of a.
Because we’ve got none of the accents of French, we’ve also got instances of the preposition which means at all to, an which would normally have a grav accent. I’m getting too much into the discussion of the spelling of French then I’m sorry. The point to take away is that by putting FW as our specification for the category, we made sure that we just get the foreign version and not the English version. What if we wanted to find all the foreign words? Well. It’s not actually necessary to specify a word at all.
If we just give the grammatical label FW after the underscore which indicates part of speech tags, or in this case more fine grained grammatical tags than the major word classes that we’ve worked with, this will simply give us a concordance of every word in the folio corpus that is foreign. Or that has been deemed to be foreign. And here we are. Just over 2000, almost 2,200 of them. So that’s 44 pages of concordance, far too much to look at. Let’s take a look at the frequency breakdown to see what we actually have got. And here we are. The most frequent foreign word is exeunt, which is Latin for they leave. Obviously very common in stage directions.
And then we’ve got manet, which is a stage direction for someone to say. FINIS, which occurs at the end. None of that is terribly interesting. But then we get into things that are actually going to be in the drama of it. And we’ve got things like French words here, but also more Latin words. Can I spot any Italian or Spanish? Some of these might be Spanish or Italian, I’m not sure. Again, there’s multiple pages of this, which I’m not going to go through because you get the point. I’m going to end with one slightly more complex query, which is if we’ve got our foreign word, how can we search for just the foreign words in a particular language?
Well, there is annotation inline in the corpus which specifies foreign languages. And some of you will know what xml is. This is a pseudo xml style that applies properties to areas of texts. So, if you use a plus, plus means any single word when it’s in a word on its own, and we wrap it and we say it must come after a foreign_lang=Spanish, what will we get? Start our query. There we go. So here we are, Spanish words. These are not all the Spanish words because sometimes it’s just the first Spanish word of a longer phrase. But this allows us to find any particular language.
If we go back and replace it with French, just to illustrate, we’ve got rather more. So, hopefully you’ve got the idea about searching foreign words now. There’s a lot more that can be done with things like those angle bracket conventions, but that’s far more than I have time to cover in these videos. If you want to find out more, there are extensive tutorials built into the CQPweb interface. Thank you very much.

Given that you have just heard the video-talk on Latin words in Shakespeare, now seems the moment to tell you how to retrieve “foreign” words in Shakespeare using CQPweb.

We strongly advise you to listen to Andrew Hardie’s talk in one window of your computer, and open up his program, CQPweb, in another, so that you can practice what he is saying as he goes along. Obviously, you will need to pause his talk periodically.

If you have any concerns, or, if you discover some interesting things, put them in the comments.

