Vaclav Brezina

Vaclav Brezina

Professor in Corpus linguistics at Lancaster University, lead developer of #LancsBox.

Location Lancaster University


  • Hi Chloe - yes indeed - you can take any of the individual modules for credit. More info:

  • Hi both, this issue has now been resolved.

  • Hi Jennifer - currently, the Using corpora in Language teaching module is offered in every cycle as are all of the other modules. We can confirm closer to your intended application date what you can expect if you apply.

  • A very warm welcome to everyone! We are really excited that you have joined the Corpus MOOC this year!

  • Hi Andres - the following link will provide more information about the individual 3-month courses that can be taken for credit

  • @JorgeDisseldorp Hi Jorge - a great observation! It is always good to critically evaluate the methods in the field. If you are interested in the equation used for computing log ratio in #LancsBox or any other equation implemented in the tool, you can simply view/edit this when you click on the statistics button at the very bottom of the window.


  • Hi Kristin - great points/questions! From my perspective, human production in natural contexts as sampled in traditional corpora will always be a benchmark for any AI production. However, because AI is being increasingly more frequently incorporated in different media through which we communicate (think of various auto complete options in email clients, text...

  • I'm really glad you like this functionality, Susana - we are very proud of it because it makes corpus research easy ;)

  • I'm glad you found this useful.

  • Yes, you can do that. In fact, you can load as many corpora and wordlists as you like and keep them for later as well.

  • That's great - do let us know how #LancsBox is handling Swahili - I am very curious ;)

  • That's wonderful - do let us know if we can help with any of the details.

  • Glad to hear that!

  • That's great, Carmen - let us know your thoughts about the software.

  • Wonderful - I hope you will find the tool useful for your research.

  • Yes, these are two different terms used for the same thing.

  • Hi Robert, it seems you have installed #LancsBox into a folder where #LancsBox doesn't have full read and write privileges (e.g. Program Files). You might like to re-install #LancsBox into a different folder (Users\yourName or Desktop) or grant the current folder full privileges.

  • Yes, we will let Future learn know about this issue which seems to be related to the internet connectivity in your region. You should be able to download the video from the link above and play it on your computer locally in the meantime. I hope this helps.

  • Hi Jennifer - thanks for your question - if you simply wish to search your data there is no need for pre-processing because #LancsBox will be able to deal with this automatically. If you need to treat any parts of the transcript differently then you need to decide if these need to be separated in some way...

  • Hi Yan Li - #LancsBox offers a split-screen view that allows you to compare and contrast concordances for parallel corpora. However, in the current version #LancsBox does not include an alignment feature that would allow extracting translations automatically. I hope this helps.

  • A very warm welcome everyone - I very much hope that you will enjoy this course and learn a variety of methods, which you can apply in your own research contexts! We have a team of mentors who are ready to guide you through this process - so let us know how you are getting on!

  • Hi Evelyne, the reason why teacher resources are password protected is because they include correct answers, which should not be available to students prior doing the tasks for themselves. So the reason is entirely pedagogical. If you are an educator, simply email us for to obtain the password.

  • Many thanks for your kind words. Absolutely, you can upload your own corpus and Wizard will do the rest of the analysis for you ;)

  • Hi all - some clarification of this point: The data the lecture is based on includes the original British National Corpus 1994. This dataset, although focusing on the variety of English spoken in the UK includes Irish speakers from both sides of the border (i.e. Northern Ireland and the Republic of Ireland). It just shows that geopolitical boundaries...

  • Thanks for your kind words, Aleksandra and welcome!

  • Welcome Adriana - the social dimension of language use is very important, as you say. We also have dedicated sessions in the course for using corpora in the classroom context.

  • Welcome to this course, Martha! I hope you'll find it helpful for your research.

  • The broken link has been fixed.

  • Yes, the corpus is intended to be freely available for research purposes and comparable with the 1994 version. Currently, a balanced subset (BNC2014 Baby+) is accessible via #LancsBox

  • @MaryEllenKerans Hi Mary - very interesting questions:
    1) BNC2014 Baby+ (5M) is a mirror corpus to the original BNC Baby (4M) with the addition of 1M words of e-language. All major written and spoken genres/registres are represented (newspapers, fiction, academic writing, informal speech and elanguage)- more info:...

  • These are already available thanks to Dr. Dana Gablasova

  • This issue has now been fixed and a fully working v. 5.1.2 is available for mac

  • @LawrenceLam Hi Laurence, there seem to be an issue with the tagger file in version 5.1.2 on mac, which we are trying to sort out asap. V. 5.1.1 - which is available from the website should be fine.

  • You need to change the Unit to lemma, search again and then switch the view option in the top right corner.

  • Absolutely - You can load files in any format (txt, docx, pdf etc)

  • :)

  • Hi Sean - Which operating system are you using? Please make sure that you are installing #LancsBox in a location where you have read and write privileges such as the users folder on Windows.

  • You are welcome, Elisabeth. I hope these will be useful in your research.

  • A warm welcome to all who are joining us at this stage - with this course it is never too late to join. You can also invite your friends who might be interested.

    As you'll see in the discussions, on the corpus MOOC it is really true that the more the merrier!

  • @AmirHosseinMojiriForoushani Thank you very much and welcome to the course!

  • Thanks for your kind words, Gail. I'm glad you found the lecture useful.

  • Hi Antonio - that's great. Here's a link to Lancaster Stats Tools online, where you can explore the topic further: