Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 13 secondsGraphColl is a small, but very flexible piece of software that you do not need to install anywhere. It is a portable software. So it will run from a memory stick. It will run from any location you like. It is also a multi-platform tool. So the same file will run on Windows, Linux, and Mac, as long as Java is installed on the computer, which is the usual setup. However, if this is not the case, you can very easily update Java from the internet. So here I have my GraphColl folder. And I can just simply double-click on the GraphColl file, the jar file here.

Skip to 1 minute and 3 secondsAnd GraphColl appears on my screen. I can maximise the window to be able to operate GraphColl efficiently. I want to start with the Import tab. GraphColl is organised according to different tabs, like a browser, a web browser. And you can switch very easily between different tabs. That allows you flexibility of analysis on multiple levels. I'll start with the Import tab and start by uploading a corpus. I click on the Browse button and go to my Corpora folder. And in this case, I will choose the LOB corpus, Lancaster-Oslo-Bergen corpus. Select all files. And just click OK. Select the name for my corpus, which is LOB.

Skip to 2 minutes and 8 secondsI keep the import options as selected by default and I ignore case, which is also a default, and just hit Import. You can see that GraphColl imports this corpus fairly quickly. Currently, the corpora that GraphColl can handle are around 1 or 2 million words, depending on the setup and the memory available on your computer. Once the corpus is loaded, I can go to the Corpora tab and see that all the files and the whole corpus are loaded. GraphColl also provides the token and type count, both for the whole corpus. I can see that I have over 1 million tokens and also token and type counts per individual files. I can also import multiple corpora.

Skip to 3 minutes and 15 secondsSo in this case, I'll go back and go to the Brown corpus.

Skip to 3 minutes and 23 secondsSelect, again, all files. This is the American version of the LOB corpus, or the LOB corpus is the British version of the Brown corpus. I say Brown here.

Skip to 3 minutes and 39 secondsAnd go Import. Again, GraphColl imports these files fairly swiftly. And finally, the corpus will appear among my corpora on the corpus tab. Now I have both Brown and the LOB corpora loaded and ready to be used. I will skip the stats button for now. The stats button is just an overview of the statistical measures that you can use with the GraphColl tool. As you can see, GraphColl implements a large number of statistical measures. You can also looking to the equations that are used for the statistical measures. All major statistical measures for collocation identifications implemented in GraphColl.

Skip to 4 minutes and 33 secondsYou can see dice, LogDice, Z score, T score, minimum cells, sensitivity, the direction, or DeltaP, and some experimental statistics, such as Cohens as well. What you can also do is just write your own equation. Type in your own equation. Save it under a different name. And GraphColl will use that particular statistic of your choice to extract the collocation networks. So in order to get to the collocation networks, you need to go to the New Graph tab. And here you can select which corpus you want to be working with. Here, I'll go for LOB corpus. I select also the span, the collocation window. I keep the default five left and five right.

Skip to 5 minutes and 29 secondsHowever, I will choose, as my statistic, the MI score here. And here, there are some cutoff values. The statistic cutoff value MI score, three and above, and the frequency of concurrence of the node and the collocates. And I'm looking at five and above as default. I call this something like My Graph, and just hit New Graph. A new tab with the name of my graph appears here and a search window that allows me to search multiple times the same corpus with the same statistical setup. So let's say I'm interested in the word time and how it is used in the LOB corpus. I type time here and go Search.

Skip to 6 minutes and 32 secondsGraphColl starts searching the corpus for the word time and, in a moment, collocation network. First, all the collocates around the word time will appear.

Skip to 6 minutes and 46 secondsAs you can see, this graph is actually very, very populated and there are many, many collocates satisfying this criteria. When we look at the table view that is connected to the graph, you can see that we have over 300 collocates in this view. And we can see the statistic frequencies, both as collocates and in the whole corpus. There's the frequency or we can also sort by frequency, or we can sort by statistic, which is the default. When I click on any of these where it's in the table, it will be highlighted in the graph, and vice versa. If I click here, the words will highlight in the table.

Skip to 7 minutes and 49 secondsOK. However, this view gives us too many collocates because we set the threshold levels very low. In order to usefully explore the associations with the word time in English, we need to go back to the New Graph option. Say something like My Graph Two. And in this case, I will be looking again at the MI score. But this time, I want the MI score to be five and above with a cutoff frequency of the collocation five and above. And again, I say, New Graph. In this case, a new tab appears. And I can keep my old analysis and start a new one. Again, I will type time here and wait for GraphColl to calculate and create the collocation network.

Skip to 8 minutes and 56 secondsAnd as you can see, this time the graph is much more manageable. I have 36 collocates around the word time. As you can see, the collocates, the closer they are to the node, the word time I'm interested in, the stronger they are according to the MI score. If I'm interested in the exact statistics, I can see the MI scores here down the list again. I can click on any of those collocates in the graph. And we'll see the exact statistic here. Again, here minutes is a lower score, and so on. The great thing about GraphColl is, however, that the exploration of the corpus doesn't stop here.

Skip to 10 minutes and 0 secondsWe can look beyond the first order collocates and explore the whole network of associations and cross-associations. What you can see here is that one of the collocates of the word time is the verb to spend So let me just double-click on spend and see what happens. OK. As you can see, GraphColl now creates another ring of collocates. This time we call them second order collocates of the word time that are associated with the word spend-- spend more, spend a year, and spend money. I can go to the word waste here and double-click again. And see what happens now.

Skip to 11 minutes and 2 secondsWe can sort of pull these graphs apart a bit. It will try to keep the same distance, or the same proportion, so that the distance from the node to the collocate expresses the association strength, as expressed by the statistical measure.

Skip to 11 minutes and 30 secondsPut this in the centre. What you can see here now is that we have the word time connected to both verbs, to spend and to waste. But also, both verbs are connected to another node, which is money. And this goes back to the idea that Lakoff and Johnson, in their early book on conceptual metaphors, highlighted as one of the crucial metaphors, conceptual metaphors in the English language. Time is money. We both spend and waste time. And we use monetary metaphors to talk about time. We can buy time, for instance, which might be one of the other collocates in here.

Skip to 12 minutes and 20 secondsI can, again, double-click on money to expand the collocation network further, and just see what sort of concepts are connected with money. So apart from time and money, the obvious connection through waste and spend, there will be other concepts and other words that are associated with money. And through waste and spend also cross-associated with time.

Skip to 12 minutes and 58 secondsWe can save money. We can pay money. Value, amount, pocket, prize, and so on, lot of money, and lot of time. So again, the amount of money and amount of time is share, share, share the collocates. So in this way, we can actually explore the collocation network. One thing that we can also do with GraphColl is to look at the individual context in which these collocations appear. So let's say I'm interested in time as my primary concept. I go to quake, which will give me the concordance of the word time and the collocates on the left and the collocates on the right.

Skip to 13 minutes and 52 secondsSo I go to time here, and I can see the node in the middle and also the collocates here. I can filter this and search for the collocates of interest, such as waste. And I can see waste time and in which context this appears. I can search right. I can search left. I can search for any of the positions as well. What I can do also is to sort the collocates alphabetically. So in this case, I'm sorting on the one left. And I can see that here I have several examples of waste time. I can see actually the chunks of language as used in the corpus. Again, this appears on a new tab.

Skip to 14 minutes and 53 secondsSo I can go back to my original graph and go to money. Say, OK. Show me the keyword in context. And in this case, money appears. Again, I can sort according to any of these contexts, any of these collocates. Let's say I sort by first left about money, accept money, and money, banks, and so on. If I'm interested in the word spend, GraphColl will produce the relevant concordance lines with that particular collocate within the defined collocation window, which, in this case, is five left and five right.

Skip to 15 minutes and 42 secondsSo these are the main features of the GraphColl tool that allow you to explore corpora in multiple different ways, seeing the cross-associations and building collocation networks, such as the one demonstrated in this graph on the fly. So it's now up to you. If you like this type of research, try GraphColl with your own data, and see what sort of interesting information you can get from your corpus.

GraphColl - How to use the tool part 2

A further look at using the GraphColl tool developed at Lancaster University. This tool allows you to explore a range of collocation statistics, visualize collocation and see how collocations inter-connect.

Share this video:

This video is from the free online course:

Corpus Linguistics: Method, Analysis, Interpretation

Lancaster University

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join:

Contact FutureLearn for Support