Vaclav Brezina

Vaclav Brezina

Professor in Corpus linguistics at Lancaster University, lead developer of #LancsBox.

Location Lancaster University

Activity

  • Hi Chloe - yes indeed - you can take any of the individual modules for credit. More info: https://www.lancaster.ac.uk/linguistics/masters-level/short-courses-for-credit/

  • Hi both, this issue has now been resolved.

  • Hi Jennifer - currently, the Using corpora in Language teaching module is offered in every cycle as are all of the other modules. We can confirm closer to your intended application date what you can expect if you apply.

  • A very warm welcome to everyone! We are really excited that you have joined the Corpus MOOC this year!

  • Hi Andres - the following link will provide more information about the individual 3-month courses that can be taken for credit https://www.lancaster.ac.uk/linguistics/masters-level/short-courses-for-credit/

  • @JorgeDisseldorp Hi Jorge - a great observation! It is always good to critically evaluate the methods in the field. If you are interested in the equation used for computing log ratio in #LancsBox or any other equation implemented in the tool, you can simply view/edit this when you click on the statistics button at the very bottom of the window.

    Regarding...

  • Hi Kristin - great points/questions! From my perspective, human production in natural contexts as sampled in traditional corpora will always be a benchmark for any AI production. However, because AI is being increasingly more frequently incorporated in different media through which we communicate (think of various auto complete options in email clients, text...

  • I'm really glad you like this functionality, Susana - we are very proud of it because it makes corpus research easy ;)

  • I'm glad you found this useful.

  • Yes, you can do that. In fact, you can load as many corpora and wordlists as you like and keep them for later as well.

  • That's great - do let us know how #LancsBox is handling Swahili - I am very curious ;)

  • That's wonderful - do let us know if we can help with any of the details.

  • Glad to hear that!

  • That's great, Carmen - let us know your thoughts about the software.

  • Wonderful - I hope you will find the tool useful for your research.

  • Yes, these are two different terms used for the same thing.

  • Hi Robert, it seems you have installed #LancsBox into a folder where #LancsBox doesn't have full read and write privileges (e.g. Program Files). You might like to re-install #LancsBox into a different folder (Users\yourName or Desktop) or grant the current folder full privileges.

  • Yes, we will let Future learn know about this issue which seems to be related to the internet connectivity in your region. You should be able to download the video from the link above and play it on your computer locally in the meantime. I hope this helps.

  • Hi Jennifer - thanks for your question - if you simply wish to search your data there is no need for pre-processing because #LancsBox will be able to deal with this automatically. If you need to treat any parts of the transcript differently then you need to decide if these need to be separated in some way...

  • Hi Yan Li - #LancsBox offers a split-screen view that allows you to compare and contrast concordances for parallel corpora. However, in the current version #LancsBox does not include an alignment feature that would allow extracting translations automatically. I hope this helps.

  • A very warm welcome everyone - I very much hope that you will enjoy this course and learn a variety of methods, which you can apply in your own research contexts! We have a team of mentors who are ready to guide you through this process - so let us know how you are getting on!

  • Hi Evelyne, the reason why teacher resources are password protected is because they include correct answers, which should not be available to students prior doing the tasks for themselves. So the reason is entirely pedagogical. If you are an educator, simply email us for to obtain the password.

  • Many thanks for your kind words. Absolutely, you can upload your own corpus and Wizard will do the rest of the analysis for you ;)

  • Hi all - some clarification of this point: The data the lecture is based on includes the original British National Corpus 1994. This dataset, although focusing on the variety of English spoken in the UK includes Irish speakers from both sides of the border (i.e. Northern Ireland and the Republic of Ireland). It just shows that geopolitical boundaries...

  • Thanks for your kind words, Aleksandra and welcome!

  • Welcome Adriana - the social dimension of language use is very important, as you say. We also have dedicated sessions in the course for using corpora in the classroom context.

  • Welcome to this course, Martha! I hope you'll find it helpful for your research.

  • The broken link has been fixed.

  • Yes, the corpus is intended to be freely available for research purposes and comparable with the 1994 version. Currently, a balanced subset (BNC2014 Baby+) is accessible via #LancsBox

  • @MaryEllenKerans Hi Mary - very interesting questions:
    1) BNC2014 Baby+ (5M) is a mirror corpus to the original BNC Baby (4M) with the addition of 1M words of e-language. All major written and spoken genres/registres are represented (newspapers, fiction, academic writing, informal speech and elanguage)- more info:...

  • These are already available thanks to Dr. Dana Gablasova http://wp.lancs.ac.uk/corpusforschools/esl-teaching-materials/

  • This issue has now been fixed and a fully working v. 5.1.2 is available for mac

  • @LawrenceLam Hi Laurence, there seem to be an issue with the tagger file in version 5.1.2 on mac, which we are trying to sort out asap. V. 5.1.1 - which is available from the website should be fine.

  • You need to change the Unit to lemma, search again and then switch the view option in the top right corner.

  • Absolutely - You can load files in any format (txt, docx, pdf etc)

  • :)

  • Hi Sean - Which operating system are you using? Please make sure that you are installing #LancsBox in a location where you have read and write privileges such as the users folder on Windows.

  • You are welcome, Elisabeth. I hope these will be useful in your research.

  • A warm welcome to all who are joining us at this stage - with this course it is never too late to join. You can also invite your friends who might be interested.

    As you'll see in the discussions, on the corpus MOOC it is really true that the more the merrier!

  • @AmirHosseinMojiriForoushani Thank you very much and welcome to the course!

  • Thanks for your kind words, Gail. I'm glad you found the lecture useful.

  • Hi Antonio - that's great. Here's a link to Lancaster Stats Tools online, where you can explore the topic further: http://corpora.lancs.ac.uk/stats/index.php

  • New version 5.1.2 (just released) fixes the issue with CQL e.g. [word="visuali[sz]e"], which now works

  • Hi Saman, as for all university programmes in the UK, there is a language proficiency requirement for this programme. This is to ensure that the students can benefit from the modules and are able to write the dissertation successfully. There is still plenty of time to take one of the tests (IELTS academic, Trinity ISE etc.).

  • Hi Abbas - yes we do support right-to-left languages - please see page 5 of FAQ for more details http://corpora.lancs.ac.uk/lancsbox/docs/pdf/FAQ.pdf

    Also, we offer the users full flexibility to localise #LancsBox for their own language http://corpora.lancs.ac.uk/lancsbox/localise.php

    I hope this helps.

  • This session on 31 October might be of interest to anyone considering the programmes: https://www.lancaster.ac.uk/events/corpus-linguistics-at-lancaster-a-new-ma-programme-taster-session

  • In this situation, the use of the chi-squared test is not entirely appropriate (due to a violation one of the basic assumptions of the test). I know that the chi-squared test has been used for collocations but I would recommend trying a different association measure such as log Dice.

  • Hi Fiona, this depends on whether your security settings allow you to install a new app. You don't need do downgrade them or switch the firewall off. An alternative would be to install #LancsBox on a virtual machine. I hope this helps.

  • A great point, Alison! Indeed, many corpora consist of text samples (parts of texts) rather than whole texts. These usually tend to be balanced for the beginnings, middles and ends of texts and, as you say, this has both theoretical and practical implications.

  • Welcome, Ana - Indeed, we can learn a lot from each other.

  • Many thanks for your kind words, Andrew: welcome to the course!

  • Thanks for joining the course, Adriana - I hope you will enjoy it!

  • Many thanks for your kind words, Halyna! Indeed, being exposed to a variety of practical examples of different data sets and the statistical techniques to analyse them is the best way to gain experience and confidence in this area.

  • Hi Halyna - You can use the application form for PgCert and indicate that you want to take only a specific module. pgadmissions@lancaster.ac.uk will be happy to assist you in this process.

  • Hi Tassos - you can indeed take the individual modules separately for credit.

  • @MonikaSau Hi Monika, I think the problem is connected with the incorrect tagger file being activated. To fix this, go to the LancsBox folder resources/tagger/bin and delete the tree-tagger file and and delete the suffix in tree-tagger.lin

  • :)

  • Hi Beatriz - Yes you can - you need to define these using specific words that define thesis statements.

  • Hi Andrew - this is a great example of applying CL in the classroom. The challenge is always to come up with tasks that capture students' imagination and you have come up with a very creative way to show how adjectives are ordered. Well done!

  • Thanks, Brian! We are here to help.

  • Hi Brian, you can access past searches by pressing down arrow on your keyboard. I hope this helps.

  • Hi Steve, you need to use the correct character (pipe) for the search to work, i.e. /research|study/

  • Hi Mary, you can compare collocation graphs in different corpora by splitting the window, but that will be graph based on corpus 1 in the top panel and graph based on corpus 2 in the bottom panel. If you want to amplify the evidence and base a single graph on multiple corpora, you simply need to load them as one combined corpus. I hope this helps.

  • A great reply - a few more details.
    1. JJ.* also includes comparatives (e.g. 'better') and superlatives (e.g. 'best')
    2. [word="visuali[sz]e"] is currently broken because #LancsBox autocorrect function kicks in. We'll see to fix in the next release.

  • Also, the installation instruction (pdf above) show in detail how to adjust the security settings on mac.

  • Welcome to the course, Adi!

  • A very warm welcome, Elisabeth!

  • Hi Alberta - welcome and let us know in the discussions if #LancsBox works for the purposes of your research.

  • Hi Dragica - welcome to the course. I hope you'll find it useful for your research. Do let us know in the discussions ;)

  • In the new version 5.0 and above, all corpora are displayed in the same window. Restricted-access corpora have a small icon of a padlock next to them. For these corpora (e.g. BNC2014-Baby) the text feature is not available due to copyright.

  • @RobinGill It would be best to check residential requirements with pgadmissions@lancaster.ac.uk

  • Great example of the application of 95% CIs - looking forward to reading your study.

  • Hi Robin - the programme has been designed as a distance form of study to give students the maximum amount of flexibility. The annual fees are:

    The fees for 2021/22 are:
    UK: Annual Fee £4,690
    Overseas: Annual Fee £9,680

  • [pos="V.*"] [word=".*ing" & pos="V.*"] should work if you have expressions such as 'is going' in mind.

  • @TestT Great answer!

  • Hi Tatyana, I don't think there is a direct compatibility between #LancsBox and CQPweb so downloading the subcorpus from CQPweb might be a bit tricky.

  • Hi Fernando - this sounds like a tagger issue. Please go to the LancsBox folder and under \resources\tagger\bin identify the correct tagger version for linux (tree-tagger.lin), delete all the other files in the bin folder and rename the tree-tagger.lin file to simply tree-tagger

    I hope this helps.

  • The full stop is preceded by a back slash to indicate that it is meant literally as a full stop and not as a meta character meaning any character - please see my answer below. But please don't worry about these details too much. #LancsBox is designed in such a way as not to expect its users to know these details - hence the smart searches and other...

  • Hi Shona - many thanks for your questions. First of all, I would say you don't need to worry about these details too much, unless you want to. These are features designed for very advanced users and we don't expect the MOOC participants to master them ;) But since you've asked:

    . (dot) in regular expressions means any character
    .* (dot asterisk) means any...

  • @NataliaBurbano You need to install the new version, which says #LancsBox 5.1.1 on the top bar.

  • Hi Nino - Could you please fill in the bug reporting form including adding a screen shot so that we can identify the problem? https://docs.google.com/forms/d/e/1FAIpQLSdTFmihomWpmyPmAPPyXlYn-kBnQAPpkWXu8x2hcM29ISeT5g/viewform

  • It seems that you installed #LancsBox to a location where it doesn't have the write permission to the folder (e.g. Program Files on Win). If this is the problem, simply install #LancsBox with the default settings to your user folder or choose desktop as a location, if you prefer.

  • Hi Helen, if you have problems downloading the data either your internet connection is very slow or you have installed #LancsBox to a location (e.g. Program Files on Win) where it doesn't have permission to write in the folder.

  • @NastassjaA Hi Nastasja, the MA/PGCert and PhD are two separate application processes. 1+3 refers to some competitive scholarships that offer MA and PhD as a package, but there are very few of those. Please note that that the Distance MA programme in Corpus linguistics is a two-year programme. Since you already have your MA qualification, you can consider the...

  • @MaryEllenKerans Hi Mary - the latter (it is really 'smart' ;). The smart search for the PASSIVE is defined as the verb to BE (in any form) followed by up to three optional adverbial elements followed by the past participle, in short /VB. (R.* ){0,3}V.N/. This search uses the part-of-speech annotation that #LancsBox provides automatically for...

  • @NataliaBurbano amigo de * should now work, if you update to #LancsBox v 5.1.1

  • @KatiaAdimora We have just released a new version (5.1.1) which fixes this issue.

  • Hi Kelly, you might need to check your system settings on Mac to enable the right click function.

  • Hi Oriol - that's great to hear. You can press Ctrl + to make the font lager

  • Welcome, Helen!

  • Welcome to the course, Oriol - great to hear you've already started playing with #LancsBox. I hope you'll find the tool and this course useful for your research.

  • Thank you for your kind word, Emanoel - we are here to help: enjoy the course!

  • A very warm welcome, Ana.

  • A very warm welcome, Sergio - I hope you will find the course useful for your research.

  • @AnnaVogel Great to hear that! Many thanks for getting back.

  • @KatiaAdimora We've identified the bug and will be releasing a version which fixes this.

  • Hi Nastassja - many thanks for your Qs. Linguistic analysis of literary works using the corpus method, is a great area of research!

    Re scholarships: It is best to check what the specific conditions are of a particular PhD scholarship - these differ considerably. Occasionally, there is also 1+3 funding available.

    With a BA degree from the UK (or other...

  • @MachtelddeVos I'm glad it helped ;)

  • Hi Xiaowen - You can use the filter function in #LancsBox instead of a stop list. If you have a list of words, you can enter them in the following format /word1|word2|word3|word4/. From the scientific point of view, stop lists are rather problematic because they modify the results and remove the researcher from the reality of the data.