Skip to 0 minutes and 10 seconds Hello! Today I’m with Emmanuel Cartier. Hi, Emmanuel, can you say a few words about yourself? Yeah, thank you. So I am Emmanuel Cartier, an Assistant Professor at the University of Paris 13, and researcher at the National Centre for Scientific Research in France, and I am a linguist and a computational linguist. Can you tell us a little about your research project and its challenges? Since 2015 we have been working on a research project called Neoveille which is funded by the ANR French Research Agency. So, the aim of this research is to automatically detect new words and meanings in seven languages, French, Polish, Czech, Greek, Brazilian, Portuguese, Russian and Mandarin Chinese.
Skip to 0 minutes and 56 seconds So, we work with research partners in eight different countries from France to China. This project has many challenges. One challenge is how to automatically find neologisms. There are three main ways in which neologisms are introduced in a language. One way is the creation of a new word, for example, with derivation (like ‘Twittery’), or composition, - ‘tweet clash’, ‘fact tweet’ for example. So, to detect this type of neologisms the main idea is to check if a word appears in a reference dictionary. But electronic dictionaries are not available for every language and they do not cover the whole vocabulary of the language. Texts also contain spelling mistakes. So, we combine several techniques like machine learning.
Skip to 1 minute and 47 seconds Another way of adding new words in a language is through borrowings from other languages. To find new borrowings we can use also dictionaries but in the multilingual and connected era we live in a lot of texts have foreign words that are not borrowings so we need to filter those out. A further way to add innovation in a language is with new usage of existing words which are also called semantic neologisms. For example, the meaning of mouse as a computer pointing device was created from the initial meaning referring to the animal and then translated into other languages. How can we find words automatically?
Skip to 2 minutes and 29 seconds We have to rely on more sophisticated techniques like statistics and recent advances in semantic analysis especially the distributional semantics. These methods track how a word is used and detect changes in its profile over time. And finally, another challenge in this project is how to follow and track the emergence and spread of neologisms. We can see that frequency is a first good indication but most neologisms just occur once and then disappear. That’s really interesting. And can you tell us about the findings and the results of your project so far? Yeah. So thanks to this project we can answer many interesting
Skip to 3 minutes and 14 seconds questions like: What is the importance of neologisms in the history of languages? Are there specific communities from which neologisms mainly arise? And can we explain why many neologisms only appear once and a few are adopted by everyone? So, during the project we have achieved several results. One main achievement is the web platform available at Neoveille.org. The general public can consult the results on the website. A linguist can use this platform to add new web sources of information. They can also approve the neologisms that automatically detected and describe them. So, another achievement is an exhaustive analysis of neologisms in French. So, we have described more than 20,000 neologisms from 250 web sources which have appeared in the past three years.
Skip to 4 minutes and 13 seconds All of those results together with publications are available on the website. That’s really exciting. And what are your future plans for the project? Yeah in the future we plan to do an exhaustive study of languages other than French. So far we have researched Italian, Greek, Portuguese, Czech and Polish. We have collected more than 10,000 neologisms for each language. We are describing and analysing them and we intend to publish the Results next year. Semantic neologism detection is still at an early stage, and the time span of our corpora is not yet sufficient to draw any conclusions on models of spread.
Skip to 4 minutes and 56 seconds So, we are also in contact with dictionary editors to track new words and new usage to update existing dictionaries, and we are setting up a research network on lexical innovation at the European level. Thank you.
How to find new words automatically?
Barbara McGillivray interviews computational linguist Emmanuel Cartier about his project Neoveille on neologism tracking.
Emmanuel Cartier is Assistant professor at the University of Paris 13.
The video is primarily about the Neoveille project, which explores how new words and meanings are detected from a range of global languages.
Cartier, E. (2017) ‘Neoveille, A Web Platform for Neologism Tracking’. in: Proceedings of the EACL 2017 Software Demonstrations. Valencia, Spain. 3-7 April 2017, 95-98. available from http://www.aclweb.org/anthology/E17-3024
Cartier E., Sablayrolles J.-F., Boutmgharine N., Humbley J., Bertocci M., Jacquet-Pfau C., Kübler N. et Tallarico G. (2018) ‘Détection Automatique, Description Linguistique et Suivi des Néologismes en Corpus: Point d’étape sur les Tendances du Français Contemporain’. Actes du Congrès Mondial de Linguistique Française, Mons (Belgique). held 9-13 Juillet 2018. available from https://www.shs-conferences.org/articles/shsconf/pdf/2018/07/shsconf_cmlf2018_08002.pdf
Cartier, E. (2019) ‘Neoveille, Plateforme de Détection, de Repérage et de Suivi des Néologismes en Onze Langues’. Neologica. available from https://lipn.univ-paris13.fr/neoveille/docs/Neoveille_neologica2019.pdf
© Barbara McGillivray. CC BY-NC 4.0