Skip main navigation

Part 1: getting started

An introduction to a range of issues to consider when building your own corpus.
In this session, we will be talking about how to build corpora. Lancaster University has a very long tradition of corpus design and corpus analysis. And we will be sharing some of our experience with you. Building corpora is an essential part of corpus linguistics, which is the science of collecting and analysing corpora. The starting point is always thinking what a corpus is in relation to what it represents. The corpus is a sample. Here, the language and different variations and varieties of language is represented by different shades of the colour in the background. The corpus, that sits somewhere in the middle, is a sample.
And ideally, when trying to get different bits and pieces of language– different examples of languages, if you’d like– we should be able to cover different aspects of language use. And in an ideal case, the corpus would then reflect well what’s outside of the corpus– the language as it is used in the wild, if you’d like. The corpus doesn’t have to be large. It’s like with wine tasting. You can have a tiny bit of sip You don’t have to drink the whole bottle to be able to appreciate the quality of the wine. The questions that we are asking about corpus is whether the corpus is representative of the language that it is designed to represent, whether it is balanced, unbiased.
In this session, we will be taking three corpora designed at Lancaster University as examples of good practise in corpus building. The first corpus is the British National Corpus 2014. The second corpus is the Trinity Lancaster Corpus spoken out to English. And finally, the third corpus is the Guanguai Lancaster Corpus of L2 Chinese, both spoken and written. And we’ll be talking about corpus design, corpus development, and corpus annotation, demonstrating it with real data, with real corpora, and hopefully drawing some conclusions about how to build your own corpus.

Vaclav Brezina introduces the process of corpus design and corpus building and provides an overview of this lecture.

This article is from the free online

Corpus Linguistics: Method, Analysis, Interpretation

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education