Humans and corpora

You learned about the role of corpora in dictionary-making in Week 3, and in Step 4.7 you saw the effect of corpus choice on the Oxford Junior Dictionary.

In Week 6 we will discuss the role of corpus-based tools in the automatic identification of new words. However, in Week 3 you also learned about the use of human informants in dictionary development. In addition to corpus evidence, some major publishers still use the findings from teams of human readers to discover new words and meanings which are worthy of inclusion in their dictionaries.

The role of human readers at Merriam-Webster is explained on the Merriam-Webster ‘Help’ page:

“Each day most Merriam-Webster editors devote an hour or two to reading a cross section of published material, including books, newspapers, magazines, and electronic publications … The editors scour the texts in search of new words, new usages of existing words, variant spellings, and inflected forms – in short, anything that might help in deciding if a word belongs in the dictionary, understanding what it means, and determining typical usage.”

Similarly, members of the Oxford Reading Programme run by Oxford University Press methodically read a wide range of texts, looking out for new words and meanings. The information they gather is stored in a database where it can be used by all the lexicographers at Oxford University Press. As we saw in Week 3, knowing when an item was first used is particularly valuable for dictionaries compiled on historical principles, like the Oxford English Dictionary. Oxford Reading Programme data also informs the Oxford Dictionary of English and the Online Oxford Dictionary, but the publicity material for these dictionaries focuses more heavily on their use of the Oxford English Corpus.

Merriam-Webster requires that ‘a word must be used in a substantial number of citations that come from a wide range of publications over a considerable period of time’. This specification is intentionally vague, because the required number and range of citations varies according to the word. The Merriam-Webster ‘Help’ page gives the example of the word ‘AIDS’, which suddenly appeared in the 1980s, and became firmly established in a relatively short time. AIDS was quickly accepted into Merriam-Webster dictionaries, whereas other less prevalent words were not included until they had been in use for many years.

The same point is made in the answer to the question: how does a word qualify for inclusion in the Oxford English Dictionary (OED)? on the OED ‘Frequently Asked Questions’ (FAQ) page:

“The OED requires several independent examples of the word being used, and also evidence that the word has been in use for a reasonable amount of time. The exact time-span and number of examples may vary: for instance, one word may be included on the evidence of only a few examples, spread out over a long period of time, while another may gather momentum very quickly, resulting in a wide range of evidence in a shorter space of time.”

The OED FAQ page also explains that a new word has to ‘reach a level of general currency where it is unselfconsciously used with the expectation of being understood’. If a word always has to be explained when it is used, this is a sign that it is not sufficiently established to be included in the OED.

Your task

What kind of information about new words do you think a corpus can provide, but a team of human readers cannot provide?

What kind of information about new words do you think a team of human readers can provide, but a corpus cannot provide?

Further reading

Select the following link for more additional information about the way Oxford Dictionaries are created.

Share this article:

This article is from the free online course:

Understanding English Dictionaries

Coventry University