What is sampling and representativeness?

In this video, Professor Dan McIntyre talks about some of the core principles behind corpora and corpus methods.
Imagine you’re an environmental scientist and you want to test the water quality in this pond. How would you go about doing that? Obviously, you can’t test all of it. It’s not practical to drain the pond entirely in order to do that. So what you’d do instead is take a small sample.
So here we’ve got a sample of the pond water, and I’ve been careful to take it from an area of the pond where there’s no algae on the surface film of the water. And I’ve taken care not to stir up any sediment from the bottom of the pond. This means that I can be reasonably confident that the water in this beaker is a good representation of the water that you’d find elsewhere in the pond. And what that means is that if I were to analyse this sample, I could be reasonably sure that whatever I find in it, I’d find in the pond as a whole.
The beaker of water is a sample from a wider population. Population is a term from statistics that just refers to the entire group of things that you want to draw conclusions about. Often, as in the case of the pond, it’s impossible to study an entire population. So what we do instead is take a sample of it. And this is true when we’re studying language. Imagine, for example, that you want to study the stylistic characteristics of British English newspapers. You can’t study all of them. So what you do instead is you take a small representative sample. That means understanding what that wider population of British English newspapers looks like. And that means asking questions like “How many different newspapers are there?”
“When were they published?” “Who reads them?” And so on. And by asking questions like these, it’s possible to define the population that we’re interested in, and from that, to select a representative sample, what we call a corpus. Building a linguistic corpus is a bit like dipping the beaker into the pond. Except what we’re collecting is not water, but words.

In this final week of the course, we will be exploring how the use of methods from corpus linguistics has transformed stylistics. If you’re not familiar with corpus linguistics, don’t worry. We’ll begin by explaining what a corpus is and then we’ll talk through some of the common analytical methods used in corpus linguistics. In particular, we’ll look at how these methods can be used to supplement stylistic analysis.

The reason that corpus methods are useful in stylistics is that they solve a problem that Leech and Short (2007) summarise as follows:

…the sheer bulk of prose writing is intimidating; […] In prose, the problem of how to select – what sample passages, what features to study – is more acute, and the incompleteness of even the most detailed analysis more apparent. (Leech and Short 2007: 2)

It’s impossible to analyse lengthy texts qualitatively but corpus linguistic methods offer another way of examining them. And it’s not just prose fiction that we can analyse, of course. We can use corpus methods to explore texts of all types and lengths, as you’ll find out this week.

To begin with, watch the video in which Dan McIntyre talks about some of the core principles behind corpora.


