Skip to 0 minutes and 9 seconds In this session, we will be talking about how to build corpora. Lancaster University has a very long tradition of corpus design and corpus analysis. And we will be sharing some of our experience with you. Building corpora is an essential part of corpus linguistics, which is the science of collecting and analysing corpora. The starting point is always thinking what a corpus is in relation to what it represents. The corpus is a sample. Here, the language and different variations and varieties of language is represented by different shades of the colour in the background. The corpus, that sits somewhere in the middle, is a sample.

Skip to 0 minutes and 55 seconds And ideally, when trying to get different bits and pieces of language– different examples of languages, if you’d like– we should be able to cover different aspects of language use. And in an ideal case, the corpus would then reflect well what’s outside of the corpus– the language as it is used in the wild, if you’d like. The corpus doesn’t have to be large. It’s like with wine tasting. You can have a tiny bit of sip You don’t have to drink the whole bottle to be able to appreciate the quality of the wine. The questions that we are asking about corpus is whether the corpus is representative of the language that it is designed to represent, whether it is balanced, unbiased.

Skip to 1 minute and 42 seconds In this session, we will be taking three corpora designed at Lancaster University as examples of good practise in corpus building. The first corpus is the British National Corpus 2014. The second corpus is the Trinity Lancaster Corpus spoken out to English. And finally, the third corpus is the Guanguai Lancaster Corpus of L2 Chinese, both spoken and written. And we’ll be talking about corpus design, corpus development, and corpus annotation, demonstrating it with real data, with real corpora, and hopefully drawing some conclusions about how to build your own corpus.

Vaclav Brezina introduces the process of corpus design and corpus building and provides an overview of this lecture.

