Tony McEnery

Tony McEnery

Has been working for over 20 years to help pioneer new ways to use computers to analyse very large collections of language data.

Location Lancaster University, UK

Activity

  • Glad to have been of help!

  • Many thanks Martha!

  • @RobertFong Glad you came back Robert!

  • Many thanks for the kind comments - they are much appreciated!

  • Thanks you Mariah!

  • Thanks Rosalia, always nice to get feedback like this.

  • Yes it may - and with the BNC 2014 we have a more fine grained classification of age that would help.

  • Thanks for the comment Susana!

  • Yes, many thanks for the examples!

  • @CarolineGodfrey Interesting that this was the word you heard males use - as you will see in the lecture!

  • They are great readings, aren't they?

  • A pleasure Jacky!

  • Hi Liliya, for personnal use, you are pretty muc in the clear. For the purpose of sharing, you need to be clear about copyight restrictions and ethical restrictions. Neither are necessarily straightforward and either may make you stop what you are doing.

  • Spot on Clara!

  • Absolutely Clara - look at my comment to Yaroslava above also.

  • Thanks for sharing Yaroslava. It is often worth asking, when presented with learner language that is in some way problematic, is this the output of a poor learner, or the output of a good learner let down by poor materials?

  • Happy to shine a light on things Vinicius!

  • A nice tale which shows how far we have travelled!

  • A pleasure!

  • Happy you have found these helpful!

  • There is so much that has yet to be fully explored and explained Yaroslava - why not be the first to look at this and share your thoughts?

  • Interesting observations here - thanks for sharing, it is always interesting to hear a wide range of views relating to a number of languages!

  • Yes, you are right Howard. The vagaries of speech!

  • It is good to have the publishing houses on board, that always helps!

  • No Trace, the students taking the exam had consented to allow their exam to be recorded and used for research purposes.

  • Hi Robert, it is, of course, possible that your needs may lead you to having to digitize data, either by rekeying it (typing it in) or using OCR. The first is time consuming and both are error prone. However, there are corpora, which can be used, which have been built in just those ways.

  • You may be interested in this corpus: https://www.lancaster.ac.uk/fass/projects/corpus/LCMC/

  • Glad you enjoyed it! I found the conversation fascinating.

  • Good point about atlas.ti - the comments do apply to tools in general, not just corpus tools.

  • Nice image Elif!

  • Thanks for sharing - it is lovely to get feedback like this.

  • @YahyaAbdallaAbaker Many thanks for your kind comments.

  • Wonderful - glad to be of help Davide!

  • From memory, it was a cluster which, when we looked at the concordances of it, looked very interesting in terms of its role in the discourse. That is why we focus on it. Other clusters were looked at, of course.

  • @RobertWilliams I think a key cluster could be viewed from the perspective of a collocation network, except in the case of the key clusters we are looking at +/- 1 spans only. So it could be that something which is a key cluster, with a +/- 1 span, may not be a collocate when the span expands to +/- 5, i.e. there is a very localised connection, but this is...

  • Is your native language Lithuanian? It is always interetsing to hear of cases in languages other than English which broadly mirror - or differ from - our findings.

  • In terms of the different newspapers? That information was included in the markup of the articles. We then used our knowledge of the newspaper publishing industry, and existing research on it, to further categorize the newspapers into the tabloid (popular) v. broadsheet (serious) categories.

  • More birthdays to come in the next five weeks, I promise! :-)

  • My pleasure Nataniel, glad you enjoyed the talk.

  • A good point Trace, glad you noticed that.

  • Nice answer Alexandra!

  • Great to read these analyses, from across the globe. A lot of common themes seem to be coming through.

  • Thanks Laila - it was a very enjoyable conversation.

  • Absolutely Micaela - we narrow the problem to the point where human expertise can be brought to bear and, as you say, concordancing is a key tool at that point.

  • You anticipated my reply Robert! Later in the course you get to look at general corpora, like the BNC, where you should be able to look at specific genres to test your hypothesis. Let us know what you find! :-)

  • Lots of agreement in the definitions given this week - great!

  • Thanks for the feedback Sam - that is great to hear!

  • Why not - just post up a URL which links to it.

  • Thanks for sharing - sounds interesting!

  • But remember Evans .... the users of them sometimes do!

  • Look at the book given to you in this week's reading section too. That should help!

  • Indeed - it is true of all techniques we use to test our hypotheses - they all have strengths and weaknesses. We need to be mindful of those when we are making claims based on them.

  • It is coming up Li - week 4, I think. Do be aware, however, that automated annotation of all types of analyses a linguist can do is not possible. We will cover what can and cannot be done later in the course.

  • It could be either I guess - if it happens this time come onto the forum in the 'Technical and other issues with #LancsBox' step (1.20) and tell us it has happened. We may be able to help.

  • Lots coming up to interest you Shazia, I promise!

  • Interesting work - and the key to using any tools is to use the tools that are right for the job. If you can do your work without part of speech annotation, for example, then go ahead and do it. I often find it very usefil to let the intended work define the tools I need, I do not start by thinking simply of what I could do with the tools I have.

  • Hi Asma, collocation is something you will look at many times on this course. There are various ways of calculating it, as you will discover. However, as a general rule, we should always take the amount of evidence available to us into account when comparing different datasets - that is true here, as we know that the number of words spoken by males and females...

  • Plenty coming up about learner language!

  • Share your ideas on here - others may comment. You will find lots of people on this course with good ideas to share with you, I am sure.

  • Indeed Carmen - a very good example indeed! :-)

  • Plenty on discourse analysis coming up later in the course which should be of interest to you William.

  • Roam Teresa! Interpret the question in light of your interests.

  • Tony McEnery made a comment

    Glad you like the readings - more coming each week.

  • Could you try again? I have alerted Vaclav to your possible issue.

  • As long as you abide by any licence conditions when you use such corpora then yes, you can publish on the basis of them. Giving a reference to the corpus used (usually a paper that describes its construction) is good manner and good practice! :-)

  • The practical sessions on the course with tools like LancsBox and CQPweb will deal with issues like this, I promise.

  • No - though if you want to you can. We introduce tools for automatic annotation later in the course, so doing some forms of annotation may not be as daunting as they may appear to be at this stage. It can be useful too!

  • You may be interested in the corpus-based discourse analysis later in the course.

  • Welcome everybody! Nice to see so many people and so much enthusiasm on here.

  • Tony McEnery made a comment

    Thank you and farewell to all - we hope to see some of you again next year!

  • Thanks for these insights Kate - you have been on a grand tour of hot-spots of corpus linguistics in the US! Doug is right about the need to focus back, though we do need a little forward momentum too, I think.

  • Formats for presenting results differ depending on the context and purpose of the communication. If your goal is to present findings in an assessed essay, then the format you describe would certainly fit that purpose/context.

  • Splendid - good luck with your work!

  • Thank you to all of the participants on the course - without all of your wonderful inputs and observations this course would not have been as much fun, nor as informative, as it clearly has been. You have our thanks.

  • Thanks for the kind words Shivanee.

  • Our pleasure Katarina.

  • Wonderful, nice to hear that.

  • Useful points, thanks.

  • Thanks for sharing Hugh - and yes, as ever context is key with language!

  • Thanks for the feedback William, much appreciated.

  • Thanks Yosra!

  • Thank you for your kind words Sergio. Glad you have enjoyed the course.

  • Hi Alexander - many thanks and yes, there are ways to carry on. Look at this course, for example: https://www.lancaster.ac.uk/linguistics/masters-level/corpus-linguistics-distance-ma/

  • Our pleasure Margarita.

  • We hope to be back again next September - glad you found the course interesting.

  • When downloading from Nexis, downloading files in bulk is best, as you usually get lots of data so doing it a file at a time could be very time consuming. Brown is an old corpus so does not necessarily reflect modern best practice. Randi is right - if information applies to the whole file, put it in the header. If to the sentence, then mark it up at that level...

  • Glad the course was of use - and hope to see you again!

  • Thanks for sharing all of this Robin - and do let me know how my category scheme fares on Polish!

  • Many thanks Brian.

  • @RobinGill Ah! Pascal and Prolog were language I programmed in, along with C. I used to do lab sessions on some older languages - Algol, Cobol and Fortran, when I started at Liverpool University in the late 80s. All distant memories now!

  • It was very enjoyable to do - I wish he was still around.

  • Thanks for the thanks and feedback - it means a lot to us!

  • Really nice to hear this Peter, thanks for sharing.

  • A lot of people find Python helpful, those who want stats more than just text processing look to R.

  • Hi Isabella, the best way to reciprocate is to share - as we have shared these ideas with you, we hope you will share them with others also. That would be wonderful - the more people know about this approach to the study of language, the better.

  • Welcome back to Lancaster (virtually)! It is always nice to catch up with fellow Lancastrians - I do hope you are doing well and it is a delight to be able to teach you again. :-)

  • Many thanks Elisa!

  • @JodyÇiçek Yes Jody, Irony could be interesting to look at, but I think I would layer that on top of the categorization - that way you could see which categories are used to perform irony and which ones are not.

  • @SealtaíCapall Not cheating at all - in fact it is what one hopes you could do with a cetagorization system like this.