The selection process: a lexicographer’s view
In this question and answer session, Michael Rundell answers questions about the way dictionary editors choose new words for inclusion in their dictionaries.
You can read more about Michael Rundell’s profile in the ‘Course team welcome’, Step 1.2.
Q. In the case of print dictionaries, how do dictionary editors decide which words to include in the dictionary, and which to leave out?
A. There are two important factors here: one is what we call the ‘user profile’ (that is, the kind of user the dictionary is designed for); and the other is the size of the dictionary.
The user profile is really the starting point. At the earliest stage of planning any dictionary, you have to have a clear idea of who the dictionary is aimed at: for example, are the users young children, high-school students, or adults; are they fluent speakers or language learners, and so on? Understanding the reference needs and language proficiency of your intended user informs every aspect of the dictionary’s design, and that includes your policies on which words the dictionary will describe.
Then there are the limitations of space. Dictionaries in book form, even the very largest ones, have a finite amount of space: you can’t include everything (even a huge dictionary like the Oxford English Dictionary doesn’t), and this means we have to be selective, and have quite strict criteria for deciding which words go in and which don’t. It’s what we call a zero-sum game: with only limited space, if you decide to add one word, you may have to exclude or remove another.
We can then use corpus data to refine our criteria. Corpus software can tell us not only how frequent a word is, but also how well distributed it is across different text types. Let’s say your dictionary is designed for adult fluent speakers, and will have a headword list of 50,000 words. A good way to approach this is to generate a list of, perhaps, the 60,000 most frequent and most widely-used words: that will give you a good candidate list, which can then be refined and reduced according to what you think the users of this particular dictionary are likely to encounter and will need to know about.
Q. Can you give some examples of words that you decided not to include in a print dictionary, and why?
A. A nice example is the names of chemical elements. There are about 120 of these, but some are extremely rare and most of our users would probably never come across them. In the original edition of the Macmillan English Dictionary (2002), which only existed in book form, we didn’t include the full set of elements. We selected just the 40 or 50 that were most commonly referred to – elements like ‘carbon’, ‘potassium’, and ‘hydrogen’, but not ’flevorium’, ‘meitnerium’, or ‘niobium’. If we had included them all, it would have taken up space that could be used for more useful words. But once the dictionary went online, where space is unlimited, there was no good reason not to include the full set, and this is what we have done. So we had different policies for the print and online editions – but I think both are justified.
Q. In the age of digital dictionaries, how do you decide what words to accept, and what to leave out?
A. Moving from print to digital is a game-changer. The obvious reason is that we no longer have any limitations of space, so we can include far more words if we want to. There is also a less obvious reason, which is that when your dictionary is online, it is much harder to know who your users are, so creating a user-profile is less straightforward. Dictionaries like the Oxford Advanced Learner’s Dictionary or the Macmillan English Dictionary – in book form – are very clearly dictionaries aimed at people whose first language is not English: typically senior high school students and above. Dictionaries like these would be grouped together in a particular area of the bookshop, and people would often buy them on the recommendation of their teacher. So we, as dictionary publishers, could be pretty confident that we knew the kind of person who would be using our dictionaries, and could tailor the content accordingly. But when dictionaries are online, we have less control over who uses them. People searching for a word using Google, for example, don’t always specify a particular dictionary – so the person who ends up on a site like https://www.macmillandictionary.com/ is not necessarily a language learner.
Q. So, it sounds as if you need to establish a whole new set of inclusion criteria for the digital age? How do you approach this?
A. I don’t think dictionary publishers have really worked this out yet. From a historical perspective, the move from print to online is very recent, and it takes time to change long-established practices. Given that space is now unlimited, the simple solution would be to say, ‘Let’s just include everything’, but there are good reasons for not going down that route. I think a useful way to approach this is to think in terms of exclusion criteria – in other words, to be clear about why some words or phrases should not be included in the dictionary. As before, we need to see evidence that a word or phrase is widely used. For mainstream dictionaries, this rules out the kind of terminology you find in ‘expert-to-expert’ discourse – in a highly specialised scientific journal, for example. We also need to be careful about ‘exploitations’, the creative, one-off uses of language which are common in novels, newspapers, and spoken discourse – but which usually arise and disappear almost immediately. Neologisms can be a problem, because we need to make a judgement as to whether the word is likely to settle into the language, or whether it will be forgotten about within a couple of months. I think the principle of replacing inclusion criteria with exclusion criteria is a good one, but as you can probably tell, this is still work in progress.
© Coventry University. CC BY-NC 4.0