Skip to 0 minutes and 12 seconds In this video, I will introduce GraphColl, a new tool for building collocation networks on the fly, developed at the ESRC Centre for Corpus Approaches to Social Science, Lancaster University. GraphColl is a free tool, so you can download it from the internet and use it in your own research for free. I will start by talking about collocations and collocation networks. And then I will give you a quick demo of the tool itself. So let’s start with the idea of collocations, one of the cornerstones of corpus linguistics.
Skip to 0 minutes and 55 seconds Collocations, as we all know, are combinations of words that we can detect in text corpora that co-occur frequently, words that appear in each other’s company very frequently. Usually when we look at collocations in other tools, we have this table view that tells us which collocates, in this case, the collocates of the word love are statistically important for the word love. Here we have collocate such as affair, fall, falling, fallen, and so on– love, love letters, and so on. We also have statistical measure, in this case, it is the MI score, the mutual information that can be used to highlight the most important collocates. However, this is the view that we usually use for collocates.
Skip to 1 minute and 56 seconds GraphColl presents another insight into this collocation relationship by showing collocates in a graphical way. This is an output from GraphColl. Actually both are, GraphColl allows you to switch between the table view and graphical view and compare these very easily and switch from one to another. In the graphical view, on the right hand side, we can see what we call the node, the words that we are interested in, love in the middle, and different collocates around the word love with different distances. The closer the collocate is to the node, the stronger the relationship between the word love and the particular collocate.
Skip to 2 minutes and 48 seconds So you can see, for instance, affair, and fall or fell are fairly close to the centre of that node, the word love. When we think collocation theoretically, there are several criteria that we use to identify collocations, different aspects of the collocation relationship if you like. Traditionally, people have looked at three criteria, distance, frequency, and exclusivity. By distance, we mean that span on the left on the right where we are looking for the collocates around the node, e.g., five left and five right. Frequency, of course, is the frequency of the co-occurrence of the words that we are interested in. And exclusivity is another important aspect of the collocation relationship.
Skip to 3 minutes and 46 seconds The fact that the words appear more frequently in each other’s company than with other words. This is usually treated by different statistical measures such as the mutual information, MI3, log likelihood, log-dice, and so on. However, there are also other criteria or other aspect of the collocation relationship, such as dispersion, how equally or unequally the collocates are dispersed throughout different texts, distributed throughout different texts in the corpus. Do they appear only in one text, or are they a feature of multiple texts? In that case, perhaps they will be more important. Directionality and delta p is a particular statistic that takes directionality into account.
Skip to 4 minutes and 43 seconds When we look at two words and the direction in which they project each other’s co-occurrence– a red herring being a very good example of this directionality feature of collocates. When we have the word red, any word can follow. However, when we have the word herring as a node, it predicts very well the fact that the word red would appear before it in most cases, at least in English. Type token distribution, again in the span of collocates, or what we call collocation window, there are different words that can compete for the top places of the collocate list.
Skip to 5 minutes and 33 seconds And again, it depends whether there are many different types competing for the position to become collocates of the word or whether there are just a few very frequent types. Finally, and most importantly for our purposes, it is the connectivity. I will explore the connectivity in the next slide.
Skip to 6 minutes and 1 second Let’s start with our initial node, the word we are interested in. In the previous slides, it was the word love. We can fast search for the word and also search for the words that occur in its vicinity and identify the strongest collocates around this node. And this is what we traditionally do with collocates and display this in a tabular form in a table. However, and this is where the idea of collocation networks comes in, we can also look at any of those collocates and consider them as new nodes and build the collocation network around them. And yet, move further down the chain. And again, identify collocates and colour codes of the collocates and so on and so forth.
Skip to 6 minutes and 58 seconds In this way, we can actually very efficiently explore corpora and discourses and to see different links and cross associations, complex discourse meaning of these words. Let’s move on to GraphColl. Before I give you the quick demo, if you are interested in more details about the idea of collocations and collocation networks, there’s a reference of a recent paper devoted to collocation networks that is also available as an open access copy so anyone can access and read this paper.
GraphColl - How to use the tool part 1
A video showing how to use the new GraphColl tool which has been developed at the ESRC CASS centre at Lancaster University. This tool allows you to explore a range of collocation statistics, visualize collocation and see how collocations inter-connect.
Download GraphColl - free download
Watch a video lecture explaining the concept of collocation
You can find a handbook with exercises in the downloads section below.
© Lancaster University