In Step 4.4 we discussed some reasons why different dictionaries make different decisions about what words to include, and what not to include.
First of all, there must be clear evidence that a new word is in regular use (or in other words, it has to be attested) before it is included in any dictionary. In crowdsourced and collaborative dictionaries, the dictionary users create the content, and they have to prove in some way that the words they contribute are actually used. Lexicographers look out for neologisms in different kinds of spoken and written sources. This work may be done manually, by human readers, or automatically, by scanning corpus data. You will learn more about technologies for automatically identifying new words in Week 6.
Obviously, frequency is a very important consideration when judging whether a new word is important enough to be included in a particular dictionary. The verb ‘google’, for example, was not included in the first edition of the Macmillan English Dictionary (2002), because very few people had even heard of it at that time. Since the beginning of the new millennium there has been a massive change in the way people use the internet, to find information, book holidays, buy things and keep in touch with their friends. ‘Google’ has therefore become a common verb in a wide range of contexts – it was added to the second edition of the Macmillan English Dictionary (2007) with the definition ‘to search for something on the Internet using the Google search engine’.
New words may be common in some contexts and not in others, so lexicographers and contributors need to check the sources that are most important to the users of their dictionaries. New words relating to current affairs and business are more likely to occur in serious radio and television news programmes, for example, and in newspapers and news magazines. Words relating to pop culture, on the other hand, are more likely to be found in celebrity magazines and TV chat shows. They also have to think about what parts of the English-speaking world they are going to cover, because many new words are restricted to certain geographical regions.
For this reason, if a dictionary is compiled with the aid of a corpus, the compilers must make some subjective judgements about the sort of texts the corpus will include. The 2015 edition of the Oxford Junior Dictionary offers a good example of the problems that might arise if the texts in the corpus are not thought to cover a suitable range of topics. The Oxford Junior Dictionary contains about 10,000 entries and is aimed at seven-year-olds; the selection of words for the 2015 edition was informed by a corpus made up of texts written by children who had contributed to the annual BBC short story competition. Based on the evidence from this corpus, some words connected with nature and the countryside which had been listed in earlier editions, such as ‘cauliflower’, ‘chestnut’ and ‘clover’, were removed from this edition to make way for words connected to modern technology such as ‘broadband’ and ‘cut and paste’. Several well-known authors were very unhappy about these changes and wrote to Oxford University Press to complain that the dictionary was not encouraging young children to connect with the natural world.
A spokesperson for Oxford University Press said:
“All our dictionaries are designed to reflect language as it is used, rather than seeking to prescribe certain words or word usages. We employ extremely rigorous editorial guidelines in determining which words [can] be included in each dictionary, based on several criteria: acknowledging the current frequency of words in daily language of children of that age; corpus analysis; acknowledging commonly misspelled or misused words; and taking curriculum requirements into account.”
The question of what kinds of texts a corpus should contain is still open to discussion. Fortunately, online dictionaries do not have the same space constraints as print editions, so it is no longer necessary to remove some entries simply to make way for others. The size of the dictionary can still affect inclusion criteria, however. Smaller dictionaries tend to restrict their coverage to the most commonly-used words, so a word often needs more citations to be accepted into a small dictionary than into a large unabridged dictionary.
What kinds of texts do you think a corpus for a children’s dictionary should contain?
Should decisions about which words to include in a dictionary depend solely on their frequency and distribution across different types of text (as discussed in Week 3), or should other factors be taken into consideration too?
Follow the link to find out more about how the Macmillan English Dictionary was created for further useful information.
© Coventry University. CC BY-NC 4.0