Want to keep learning?

This content is taken from the Lancaster University's online course, Corpus Linguistics: Method, Analysis, Interpretation. Join the course to learn more.

Skip to 0 minutes and 1 second In this part I will give you a practical demo of the GraphColl tool.

Skip to 0 minutes and 8 seconds GraphColl is a part of a larger package called LancsBox Lancaster University Desktop Toolbox that is available for free from the internet. You can just download the tool and start using it as I will show you in a moment. LancsBox works with any operating system, Windows, Linux, or Mac. The same file can be run on any operating system, so it is extremely versatile. When you download LancsBox you get one zip file. First, before you start using LancsBox you need to unzip the file by right clicking on it, say Extract All.

Skip to 0 minutes and 59 seconds This procedure will differ slightly on different operating systems. Once you’ve extracted your LancsBox who can simply double click on LancsBox jar and get the tool running. The first step when the tool is running is to upload your corpus or corpora. To do this you need to click on the Browse button and navigate to the location where your corpora are stored. LancsBox comes with two corpora, Brown and LOB, one million words of American and one million words of British English. But you can navigate to any corpus files you like. I’m going to upload both Brown and LOB.

Skip to 1 minute and 46 seconds I’ll double click on Brown, control A on Windows to select all files, Open, and just select the name of the corpus, this is Brown for me to remember, hit Import, and in a short while the corpus is ready to be used in the tool. The similar way, I go to the LOB corpus, select all files, Open, and again select the name for the corpus, LOB in this case, and go Import.

Skip to 2 minutes and 27 seconds The panel below displays the data about your individual corpora, in this case Brown and LOB. Each has 15 files, one million words each, that’s the token count. And also the type count is provided, the same for each of the individual files inside the corpora. Now we can go to the GraphColl module by clicking on GraphColl, and a new tab opens. What you can see is a simple search interface, a search box, and different options for selecting the corpus. Brown is here by default but I can change to LOB. Apply, because I want to search for the collocations in LOB. Then I need to select my statistical measure. There are multiple statistical measures that can be used.

Skip to 3 minutes and 23 seconds I’m going to, just for illustration, use the MI score say Apply. Then I go to the threshold and decide on the cutoff values, cutoff value for the statistic and cutoff value for the collocation.

Skip to 3 minutes and 39 seconds say 5 and 5 although I can keep the default. The higher the values the fewer the collocates appear in the collocation network. I say Apply, and then I can start searching for words such as ‘time.’ I hit the Search button, and in a short while a collocation network appears with the first order collocates. In addition to the graph, there is a table that displays the collocates in the graph. If I click on any of the collocates they will be highlighted in the table. And vice versa, click any of the collocates in the table they will be highlighted in the graph. So as to allow very easy navigation and connection between the graph the table.

Skip to 4 minutes and 41 seconds What I can do is also to expand the co-location that way to see beyond the first order collocates. In this case I’m going to focus on the word spend. I highlighted it both in the table and in the graph, and I can just double-click on it both in the table or in the graph. What happens is that the word will expand and show its own collocate, what we call the second order collocates. I can see that time and spend are connected because they are collocates of each other. But I can see also the second order collocates around spend. I double-click on money and then I have this very interesting collocation network based on written British English.

Skip to 5 minutes and 39 seconds What I can see here are the connections and interconnections between time and money as two big concepts in our discourse that are connected via shared collocates such as spend and different versions of the verb spend. But also waste, lot, lot of time, lot of money, and so on. So in this way I can search the corpus and explore different connections in language and discourse. One of the very useful features of the GraphColl tool is that you can actually see the context in which words co-occur.

Skip to 6 minutes and 26 seconds If you are asking the question, why is a particular word a collocate of another word, you can see this very easily by right-clicking on the collocate to display the combinations of the word and its node. If I, for instance, right-click on save, I can see all the contexts in which save and money co-occur in the specified window. Save money by doing something and so on. Save money if we don’t pay employees’ liability and all these contexts. I close this window.

Skip to 7 minutes and 10 seconds If I click on any of the shared collocates like spend, because spend is shared by money and time, what I get are two pop up windows that will show me the co-occurrences of time and spend in the window, and money and spend in the window. We can see that there are nine co-occurrences of time and spend, and 16 co-occurrences of money and spend. If I want to know more, I can just click on these three arrows pointing up, close this window, and I can see the full context in the large window. In the top panel I can see the co-occurrences of money and spend as my selected collocate.

Skip to 8 minutes and 3 seconds And in the bottom panel I can see all the other competing collocates in the same window.

Skip to 8 minutes and 11 seconds GraphColl also allows us to explore language in a split window. Just click on this bar where the arrow is pointing up, and the window will split into two. In this case, I can search for another word. I just click inside the panel to highlight it, and then I can search for the word love for instance. The word love in the Brown corpus here. I can change the settings if I like. Yes. I’m dealing with frequency. Now I change to MI Score, five and above, just to have fewer collocates in the window. And search again for love.

Skip to 9 minutes and 3 seconds And again a collocation network appears here. Which I can explore in the same way as I explored the previous one. If you are interested in reading more about this tool and the technique of collocation networks, you can find more information in an article that is freely available from the International Journal of Corpus Linguistics.

GraphColl - How to use the tool part 2

A further look at using the GraphColl tool developed at Lancaster University. This tool allows you to explore a range of collocation statistics, visualize collocation and see how collocations inter-connect.

Share this video:

This video is from the free online course:

Corpus Linguistics: Method, Analysis, Interpretation

Lancaster University

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: