Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

GraphColl – How to use the tool part 1

Using GraphColl tool developed at Lancaster University
In this video, I will introduce GraphColl, a new tool for building collocation networks on the fly, developed at the ESRC Centre for Corpus Approaches to Social Science, Lancaster University. GraphColl is a free tool, so you can download it from the internet and use it in your own research for free. I will start by talking about collocations and collocation networks. And then I will give you a quick demo of the tool itself. So let’s start with the idea of collocations, one of the cornerstones of corpus linguistics.
Collocations, as we all know, are combinations of words that we can detect in text corpora that co-occur frequently, words that appear in each other’s company very frequently. Usually when we look at collocations in other tools, we have this table view that tells us which collocates, in this case, the collocates of the word love are statistically important for the word love. Here we have collocate such as affair, fall, falling, fallen, and so on– love, love letters, and so on. We also have statistical measure, in this case, it is the MI score, the mutual information that can be used to highlight the most important collocates. However, this is the view that we usually use for collocates.
GraphColl presents another insight into this collocation relationship by showing collocates in a graphical way. This is an output from GraphColl. Actually both are, GraphColl allows you to switch between the table view and graphical view and compare these very easily and switch from one to another. In the graphical view, on the right hand side, we can see what we call the node, the words that we are interested in, love in the middle, and different collocates around the word love with different distances. The closer the collocate is to the node, the stronger the relationship between the word love and the particular collocate.
So you can see, for instance, affair, and fall or fell are fairly close to the centre of that node, the word love. When we think collocation theoretically, there are several criteria that we use to identify collocations, different aspects of the collocation relationship if you like. Traditionally, people have looked at three criteria, distance, frequency, and exclusivity. By distance, we mean that span on the left on the right where we are looking for the collocates around the node, e.g., five left and five right. Frequency, of course, is the frequency of the co-occurrence of the words that we are interested in. And exclusivity is another important aspect of the collocation relationship.
The fact that the words appear more frequently in each other’s company than with other words. This is usually treated by different statistical measures such as the mutual information, MI3, log likelihood, log-dice, and so on. However, there are also other criteria or other aspect of the collocation relationship, such as dispersion, how equally or unequally the collocates are dispersed throughout different texts, distributed throughout different texts in the corpus. Do they appear only in one text, or are they a feature of multiple texts? In that case, perhaps they will be more important. Directionality and delta p is a particular statistic that takes directionality into account.
When we look at two words and the direction in which they project each other’s co-occurrence– a red herring being a very good example of this directionality feature of collocates. When we have the word red, any word can follow. However, when we have the word herring as a node, it predicts very well the fact that the word red would appear before it in most cases, at least in English. Type token distribution, again in the span of collocates, or what we call collocation window, there are different words that can compete for the top places of the collocate list.
And again, it depends whether there are many different types competing for the position to become collocates of the word or whether there are just a few very frequent types. Finally, and most importantly for our purposes, it is the connectivity. I will explore the connectivity in the next slide.
Let’s start with our initial node, the word we are interested in. In the previous slides, it was the word love. We can fast search for the word and also search for the words that occur in its vicinity and identify the strongest collocates around this node. And this is what we traditionally do with collocates and display this in a tabular form in a table. However, and this is where the idea of collocation networks comes in, we can also look at any of those collocates and consider them as new nodes and build the collocation network around them. And yet, move further down the chain. And again, identify collocates and colour codes of the collocates and so on and so forth.
In this way, we can actually very efficiently explore corpora and discourses and to see different links and cross associations, complex discourse meaning of these words. Let’s move on to GraphColl. Before I give you the quick demo, if you are interested in more details about the idea of collocations and collocation networks, there’s a reference of a recent paper devoted to collocation networks that is also available as an open access copy so anyone can access and read this paper.

A video showing how to use the new GraphColl tool which has been developed at the ESRC CASS centre at Lancaster University. This tool allows you to explore a range of collocation statistics, visualize collocation and see how collocations inter-connect.

Download GraphColl – free download

Watch a video lecture explaining the concept of collocation

Read an article Collocations in context: A new perspective on collocation networks

You can find a handbook with exercises in the downloads section below.

This article is from the free online

Corpus Linguistics: Method, Analysis, Interpretation

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education