Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

GraphColl – How to use the tool part 2

A further look at using the GraphColl tool developed at Lancaster University
In this part I will give you a practical demo of the GraphColl tool.
GraphColl is a part of a larger package called LancsBox Lancaster University Desktop Toolbox that is available for free from the internet. You can just download the tool and start using it as I will show you in a moment. LancsBox works with any operating system, Windows, Linux, or Mac. The same file can be run on any operating system, so it is extremely versatile. When you download LancsBox you get one zip file. First, before you start using LancsBox you need to unzip the file by right clicking on it, say Extract All.
This procedure will differ slightly on different operating systems. Once you’ve extracted your LancsBox who can simply double click on LancsBox jar and get the tool running. The first step when the tool is running is to upload your corpus or corpora. To do this you need to click on the Browse button and navigate to the location where your corpora are stored. LancsBox comes with two corpora, Brown and LOB, one million words of American and one million words of British English. But you can navigate to any corpus files you like. I’m going to upload both Brown and LOB.
I’ll double click on Brown, control A on Windows to select all files, Open, and just select the name of the corpus, this is Brown for me to remember, hit Import, and in a short while the corpus is ready to be used in the tool. The similar way, I go to the LOB corpus, select all files, Open, and again select the name for the corpus, LOB in this case, and go Import.
The panel below displays the data about your individual corpora, in this case Brown and LOB. Each has 15 files, one million words each, that’s the token count. And also the type count is provided, the same for each of the individual files inside the corpora. Now we can go to the GraphColl module by clicking on GraphColl, and a new tab opens. What you can see is a simple search interface, a search box, and different options for selecting the corpus. Brown is here by default but I can change to LOB. Apply, because I want to search for the collocations in LOB. Then I need to select my statistical measure. There are multiple statistical measures that can be used.
I’m going to, just for illustration, use the MI score say Apply. Then I go to the threshold and decide on the cutoff values, cutoff value for the statistic and cutoff value for the collocation.
say 5 and 5 although I can keep the default. The higher the values the fewer the collocates appear in the collocation network. I say Apply, and then I can start searching for words such as ‘time.’ I hit the Search button, and in a short while a collocation network appears with the first order collocates. In addition to the graph, there is a table that displays the collocates in the graph. If I click on any of the collocates they will be highlighted in the table. And vice versa, click any of the collocates in the table they will be highlighted in the graph. So as to allow very easy navigation and connection between the graph the table.
What I can do is also to expand the co-location that way to see beyond the first order collocates. In this case I’m going to focus on the word spend. I highlighted it both in the table and in the graph, and I can just double-click on it both in the table or in the graph. What happens is that the word will expand and show its own collocate, what we call the second order collocates. I can see that time and spend are connected because they are collocates of each other. But I can see also the second order collocates around spend. I double-click on money and then I have this very interesting collocation network based on written British English.
What I can see here are the connections and interconnections between time and money as two big concepts in our discourse that are connected via shared collocates such as spend and different versions of the verb spend. But also waste, lot, lot of time, lot of money, and so on. So in this way I can search the corpus and explore different connections in language and discourse. One of the very useful features of the GraphColl tool is that you can actually see the context in which words co-occur.
If you are asking the question, why is a particular word a collocate of another word, you can see this very easily by right-clicking on the collocate to display the combinations of the word and its node. If I, for instance, right-click on save, I can see all the contexts in which save and money co-occur in the specified window. Save money by doing something and so on. Save money if we don’t pay employees’ liability and all these contexts. I close this window.
If I click on any of the shared collocates like spend, because spend is shared by money and time, what I get are two pop up windows that will show me the co-occurrences of time and spend in the window, and money and spend in the window. We can see that there are nine co-occurrences of time and spend, and 16 co-occurrences of money and spend. If I want to know more, I can just click on these three arrows pointing up, close this window, and I can see the full context in the large window. In the top panel I can see the co-occurrences of money and spend as my selected collocate.
And in the bottom panel I can see all the other competing collocates in the same window.
GraphColl also allows us to explore language in a split window. Just click on this bar where the arrow is pointing up, and the window will split into two. In this case, I can search for another word. I just click inside the panel to highlight it, and then I can search for the word love for instance. The word love in the Brown corpus here. I can change the settings if I like. Yes. I’m dealing with frequency. Now I change to MI Score, five and above, just to have fewer collocates in the window. And search again for love.
And again a collocation network appears here. Which I can explore in the same way as I explored the previous one. If you are interested in reading more about this tool and the technique of collocation networks, you can find more information in an article that is freely available from the International Journal of Corpus Linguistics.

A further look at using the GraphColl tool developed at Lancaster University. This tool allows you to explore a range of collocation statistics, visualize collocation and see how collocations inter-connect.

This article is from the free online

Corpus Linguistics: Method, Analysis, Interpretation

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education