Learn more about this course.

The role of corpora in dictionary making

Learn more about the role of corpora in today’s lexicographic practice.

Compared to paper slips, a corpus gives lexicographers access to a much wider context of use for a word.

The electronic format of corpora makes it possible for lexicographers to find all the occurrences of a particular word or structure and observe examples of the nuances of its meaning. This way, they can also see whether the word is used differently depending on the register of the text (formal, informal, slang, etc), the geographical location, the author, the subject matter and so on.

We can think of two main ways in which corpora help the dictionary-making process. On the one hand, the so-called ‘corpus-based’ approach sees the corpus as a source of examples. In this approach, we can imagine the lexicographer relying on their intuition of what a word means and how it is used, and then resorting to the corpus to find examples of each meaning and usage. On the other hand, the so-called ‘corpus-driven’ approach sees the corpus as the starting point of the process, in a bottom-up way. In this approach, we can imagine the lexicographer starting from the collection of occurrences of the word in the corpus, analysing them and grouping them into categories (for example, one for each sense), and then drafting the dictionary entry based on this analysis.

Corpus-driven approaches have become increasingly popular in lexicography following recent technological advances, which have made it possible for computer programs to process vast amounts of data very rapidly. It’s probably safe to say that a combination of corpus-based and corpus-driven approaches is, today, a very common practice. As we will see in more detail at the end of this course, applied research in corpus linguistics and computational linguistics are pushing the boundaries and have made it possible for dictionary publishers to mine very large corpora in search of evidence of words’ behaviour. The typical size of corpora used today by dictionary publishers is a few billion words. These corpora are usually drawn from the web and a variety of sources, including news, academic content and social media. One example is the Oxford English Corpus, which contains 21bn words and is used by lexicographers working on the OED and other Oxford dictionaries.

Want to keep
learning?

This content is taken from
Coventry University online course,

Understanding English Dictionaries

View Course

Want to keep learning?

This content is taken from Coventry University online course

Understanding English Dictionaries

View Course

See other articles from this course

This article is from the free online

Understanding English Dictionaries

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

The role of corpora in dictionary making

Want to keep
learning?

Understanding English Dictionaries

Further reading

Want to keep learning?

Understanding English Dictionaries

Understanding English Dictionaries

Understanding English Dictionaries

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

The role of corpora in dictionary making

Want to keep learning?

Understanding English Dictionaries

Further reading

Want to keep learning?

Understanding English Dictionaries

Share this

Understanding English Dictionaries

Understanding English Dictionaries

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?