Learn about recent research on finding new word senses automatically.
Many words change meaning over time, and some acquire new senses.
For example, the word ‘tweet’ originally referred to a sound produced by birds, and it has recently acquired a new sense related to a message sent via the social media platform Twitter. It is an essential part of lexicographers’ work to find new senses and define them in dictionaries. But this is a very labour-intensive task. Can we make computers help humans find new senses automatically?
Computational linguistics researchers have been looking at solutions to this challenge for a decade, and this is still an active area of investigation. We would like to share with you a couple of examples of such research, as they are very relevant to our understanding of what is happening in computational lexicography and what we might expect next.
Paul Cook and colleagues published an article in 2014 in which they describe a system for identifying new word senses automatically – you can find the details at the end of this article. These researchers compared the 1995 edition of the Concise Oxford English Dictionary
with the 2005 edition, and collected a series of examples of novel senses by hand. They then analysed random usages of each lemma in two English corpora, the British National Corpus (containing British English content from the late 20th century) and the ukWaC corpus (containing webpages with a .uk domain in 2007). The British National Corpus was used as a source of established senses, while the ukWaC corpus was used as a source of new senses. For example, the novel sense of the word domain related to the internet was found in ukWaC, but not in the British National Corpus. If we put the two corpora together, how can we teach a computer to distinguish those containing the original sense of ‘domain’ from those containing the new one?
The idea is to use the context of a word to learn about its meaning, and this approach is very popular in the field of ‘distributional semantics
’. This makes intuitive sense because, if we don’t know the meaning of a word in a sentence we can use the words around it to help us. For example, imagine that the word ‘embue’ existed and imagine that you found this sentence:
‘Many embues live in forests and eat nuts.’
Even if you don’t know the meaning of the word ‘embue’, you can use this context to guess that it refers to an animate being, probably an animal. Context (in the form of concordances) has been used by lexicographers for a long time, as we saw in Weeks 3 and 5, so it seems sensible that computers can use it to find out about words’ meaning.
Cook and colleagues used precisely this intuition to build a computer program (called ‘topic model’) that assigns the usages of a word in the two corpora to their most likely sense. A novel sense is, in their definition, a sense that is common in the recent corpus (ukWaC), but not in the reference corpus (British National Corpus). They also used the fact that in the two corpora they worked with, many novel senses are related to the domain of computing and the internet, which reflects a cultural innovation that occurred in the time period considered. Their system managed to correctly find novel senses in a good percentage of cases, and it certainly did better than random guessing. We will not give the details of their work here, but you are welcome to read the original article if you want to find out more.
The other piece of research we would like to talk about was done in 2018 by Pierpaolo Basile and Barbara McGillivray. The authors analysed a very large corpus, the UK Web Archive JISC dataset 1996-2013, which collects resources from the Internet Archive that were hosted on domains ending in ‘.uk’. They used a method to represent words as geometrical objects in space. Following the distributional hypothesis introduced above, they compare the geometrical profile of one word in each of the years from 1996 to 2013. When this profile is found to have changed sufficiently from one year to the next, that year is flagged up as the time when that word might have acquired a new meaning. For example, their system flagged up ‘blackberry’ as one such candidate. The original meaning of ‘blackberry’ refers to the fruit; however, a more recent meaning emerged (in 1999, according to the Oxford English Dictionary), and refers to the proprietary name for a smartphone. By looking at the words appearing in the content of ‘blackberry’ in 1999 in this corpus we see that the majority of them are words related to the fruit sense, for example ‘pie’ or ‘strawberry’. On the other hand, the words appearing in the contexts of ‘blackberry’ in 2003 include some words from the domain of mobile phones, such as ‘phones’ and ‘cellphones’.
Basile and McGillivray devised a way to check how accurate their system’s candidates were by comparing them with the Oxford English Dictionary, and found that around 84% of the words that changed their meaning in English between 1996 and 2013, according to that dictionary, were in fact highlighted by their system. These are, of course, preliminary results, as research in automatic detection of new meanings is still active, and new systems will be created and refined in the future. However, we hope to have shown you a snapshot into this exciting research field.
From the examples that you have read above, can you think of any other examples of words that have changed their meaning over time?
Share your responses in the comments area and read the other learner responses. Were there any new words mentioned that interested you to research them in more detail?
ReferencesBasile, P. and McGillivray, B. (2018) ‘Exploiting the Web for Semantic Change Detection’. in Proceedings of the 21st International Conference on Discovery Science (DS 2018), Cyprus. Springer-VerlagCook, P., Lau, J. H., McCarthy, D., and Baldwin, T. (2014) ‘Novel Word-Sense Identification’. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1624-1635. available from http://www.aclweb.org/anthology/C14-1154JISC and the Internet Archive (2013) ‘JISC UK Web Domain Dataset (1996-2013)’. The British Library https://doi.org/10.5259/ukwa.ds.2/1Oxford Dictionaries (1995) The Concise Oxford Dictionary of Current English. 9th edn. Oxford: Oxford University Press Thompson, D. (ed)Oxford Dictionaries (2008) The Concise Oxford English Dictionary. 11th edn. Oxford: Oxford University Press Soanes, C. and Stevenson, A. (eds)
© Barbara McGillivray. CC BY-NC 4.0