Alexander Trubochkin

Alexander Trubochkin

Corpus linguistic researches and linguistic analyses

Location Russia

Achievements

Activity

  • Hello Tony, It's great to know about it. I'm considering it. It's really interesting. Just to know, is there any kind of short-term advanced courses like this one?

  • Thank you all for this well-structured course with lots of practical activities and valuable case videos. Special thanks to Tony and all the mentors. I'd like to keep improving knowledge after this course - is there any advanced of improved courses? or any other program at a higher level? Getting a corpus linguistics qualification may be? or some specific...

  • Hello there! Can anyone clarify the difference between mentioned "corpus-driven" and "corpus-based" in this video, pls?

  • I've been using corpora for a couple of years and came to the conclusion that using corpora in language learning significantly facilitates language acquisition for students and allows teachers to compile teaching materials according to the frequency of "lexical bundles" which could be mined and included into the materials from corpora.

  • Lexical bundles in corpus perspective are more effective approach to teaching language which allow students to not be overwhelmed with rarely used vocabulary. Though, not every lexical bundle is frequent, this approach leads us to prioritise them in order to get correct language acquisition.

  • BNCWeb at Lancaster Uni: query=*ly returned 15400.64 ipm, whereas query=*ly_{ADV} returned 12869.13 ipm. Thus, adverbs ending 'ly' account for 83,6% of all words with the same ending in the corpus.

    1. Adverbs of manner: slowly, quietly, loudly, angrily - 134.62 ipm / query= (slowly_{ADV}|quietly_{ADV}|loudly_{ADV}|angrily_{ADV})

    2. Adverbs of a speaker's...

  • 1. threadbare - adjective: lead/led, exposed;
    2. luckily - adverb: meaning something might have ended up with negative outcome but didn't.

  • @ANDYBU Hello Andy! Thx a lot for your comment. Me too, I'm doing my PhD on this subject. We could stay in touch if you'd like. My email is nalugu@mail.ru Stay safe!

  • Thx a lot! I'm really fascinated after watching Claire's interview

  • Dear @AnastasiaVerutina, you should bear in mind that the word "isn't" or "ain't" consists of two tokens either. To get the correct results you should put the space between tokens. That's how tokens are to be separated in the query. Say, in the word "isn't" one token "is" and the other "n't" which is likely to be always a single token. The same for "ain't"....

  • The use of standard and non-standard linguistic forms such as "isn't" and "ain't" indicates, most of all, a speaker's position on the social ladder, then gender, age and their region. So, that we can get the picture of a speaker's origin. On my opinion, what more valuable is to keep an eye on the tendency how the society changes as time passes. To observe that...

  • Querying the corpus with simple queries one might think that men speak about or use 'colours' more often than women but things are really quite the reverse if you compare semantic groups or at least check it out with another tool that allows us to use semantic annotation, e.g. BNClab that allows us to use general semantic tagging where you can retrieve all the...

  • Hi Anna! It's great to hear that you make progress with it. Thank you! Yeh, no problem. It works. I've just checked it again in LancsBox and got 268 762 tokens (not lemmas) annotated as nouns in LOB corpus. You can have a go at searching by simple typing NOUNS in the search bar. Make sure that you type in caps and in plural, it will turn what you type into a...

  • My answers mostly go along with the answer keys. Thank you for this activity, it's great to know that a language we speak always reflects back in the way we speak whether it is a stigmatised or prestige-carrying form. So, it shapes our social status in an equal way, then)

  • Task 1
    1) It's slang addressing
    2) Midlands
    3) A dude, mate, man
    A dude is mostly in use in Wales and Southwest, then in North England and Southwest.
    A mate is mostly used in Midlands and Southwest.

    Task 2
    1) In terms of Lexis:
    good’un
    int it
    nowt
    owt
    och
    er
    2) In terms of Grammar (the forms of 'be'):
    ..me and Jimmy was…
    3) In terms of Syntax...

  • It's a hopeful conclusion. looking forward to trying regex for animate/inanimate querying a general corpus! :)

  • it's very interesting research field of gender and how distinguish male and female speech in the discourse. Looking forward to learning more on it.

  • You're welcome!)

  • Mostly, I'd say, intonation, then prosody to figure out a speaker's attitude to the subject, whether they tend to keep up the conversation or to ruin it, then lexicon, then accent, and finally co-speech gestures. All this about their social position, their cultural level, their sphere of interest and their wideness of lexicon. I usually notice it in order to...

  • Hi Daniela, if you work in KWIC you have to right click (right button on your mouse or single-touch with two fingers simultaneously on your touchpad) anywhere on the right half of the context to filter it to the right and pop-up filter menu will appear or do the same anywhere on the left half of the contexts to filter it to the left.

  • Great activity!) It was a real fun to be engaged wit it! Thx a lot!) I didn't find anything according to assessing collocations either in Essay 3 and 2, but they are mentioned in Essay 1 and since collocation technique was identified as a key one, so I decided to evaluate it higher than the other two, despite some its weakness. But taking all the details...

  • Essay 2 - indicates lack of understanding the principles of corpus analysis. The student compares, in table 1, the two wordlists in order to indicate the difference between AmE and BrE whereas these two wordlists consists of functional words entirely that are always frequent in both variations of English. No lexical units there. It seems the student didn't...

  • Hello there! Firstly, I'd like to build a phrasal verb corpus of English for investigating the phenomena of the phrasal verb to form a particular meaning conforming to the construction in which it is embedded. This corpus will facilitate conducting analyses of phrasal verbs, to study the contexts which surround phrasal verbs. This corpus will need semantic...

  • Great! Done! I've got my minicorpus about what The Guardian reports on phrasal verbs) Not much) 24 hits or 16.17 per 10k out of 15 texts) sized in total 15000 tokens. Happy with that! )

  • Sure, before the lecture I didn't know that it would be so helpful to use topoi in order to conduct the discourse analysis in the right way. I think all of them manifest themselves in the article. So, the topos of origin in terms of places, countries etc. might be another feature entering the discourse of RASIM. Thanks once more for the professionally made...

  • @RussellCross I've got the same results for Q2 part 4 (Which of these adverbs is used most frequently in newspaper editorials): 39 for "Still_adv" VS 3 for "Sincerely_adv" in numeric sense in the table (B_press_edit.txt). I still can't open the LOB (big circle) to see the clusters (small circles). I can only suppose, due to what @JoséDias said above, it may...

  • Hello! I think that refugees are usually associated with negative connotation which is indicated by using such words as death (for example, a few died in a truck fridge), drowning, homelessness, stereotypical opinions, lack of educational opportunities, illnesses, and miseries etc.

  • My first definitions of 'diamond' and 'cause' were too abstract and based just on my presuppositions. So, now this week 2 has changed my view and added useful knowledge so that i can specify that in a reasonable way. Thank you so much for the well-structured mooc education

  • Great practice! Thank you very much! Got the same results! I am enjoying learning it!

  • Thank you very much to all who took part in designing this course and collecting and making up so great comprehensible materials, take part in conducting this course, supporting and providing us with so detailed additional information, so that this course became well-structured, balanced, perfectly thought-out in the sense of teaching theory and practice....

  • Hello! I have a question pls. Mutual information is known to not be very accurate on low frequencies. The Dice coefficient, as I know, is used to gauge similarity between two sets of data or how one set of data is similar to another set. Prof Tony McEnery uses it as an alternative to Mutual information. How does Prof Tony McEnery apply the Dice Coefficient in...

  • Thank you for a very interesting lecture! Need clarification, pls! What we should bear in mind about choice between Chi-square test and log-likelihood one? Which of them should we use? first? or both? when one or another, pls? How to read a chi-square or log-likelihood test results? Is there a threshold above which the test is passed? And also how can we build...

  • The word 'diamond' is basically used as a noun in English and the word refers to jewellery, but it can be used in function of adjective as a descriptor in the construction of NOUN+NOUN type, also it can be used in a metaphorical meaning to underline the value of that what you speak about. Also you may have a go at using it as a verb with meaning of polishing,...

  • Thank you for encouraging welcome!

  • @ElenaSemino Dear Elena Semino, that would be a viable linguistic contribution to cancer treatment. Will it have online access open?

  • Hi @LorraeFox Thanks a lot, Lorrae! Got that!

  • @LorraeFox Thank you, Lorrae for the reply! Yeah, I remeber those definitions from the glossary. So, can a corpus be representative bit not ballanced?

  • Dear prof Elena Semino,
    Thank you very much for such interesting interview sharing your opinion, insights and ideas on corpus data application in numerous areas concerned with quality of life. Having listened closely to your interview I have an idea of building a specialised corpus aimed at the placebo effect especially for professionals who deal with people...

  • Since my project is concerned with semantic information mining of phrasal verb constructions in corpora and identifying possible crypto-classes of English phrasal verbs, for example "jot down" type, where "jot" is not used without down so that the existence of that phrasal verb fully depends on the presence of the particle. That's what corpus data suggest. I...

  • @GillianSmith Thank you, Gillian! I learn a lot of important details with your replies!:)

  • BTW, could you, pls, clarify once more understanding the difference between representative and balanced corpora: since they either are concerned with all the types of texts in correct proportions - when can a corpus be balanced but not representative? and can a corpus be representative but not balanced? if yes could you comment it, pls?

  • @LorraeFox Thank you Lorrae! Looking forward to the next week of the course. I'll learn about "sampling frame". In the glossary, "sampling frame" refers to a set of instructions or/and features how samples can be chosen. Could you pls set an example what are these instructions or features? For example? Are the any rules for building a sampling frame? Some...

  • @LorraeFox Hi Lorrae! Thank you very much! Now I see the difference between the * and the ?. As the ? stands for zero or just 1 character attached to the end. But Does the ? mean character deduction? - if the ? means characters attached to the root, should we use a longer ford-form with those characters attached in the regex? Why not use the root itself with...

  • @GillianSmith Thank you very much! But if I tokenise the phrase "He can't run" as "He", "ca", "n't", "run" - 4 tokens and 4 types, how many words are there? 3 - "He", "can't", "run" or shall we count "can't" as two words "can", "not" ?

  • @GillianSmith Thank you Gillian, I really missed the ? part. I thought they are interchangeable. Now I see, thank you!

  • ...and also the explanation in the table says that /word/i is just a string of characters case insensitive. And if I add the asterisk which means zero or more of that string. So than the regex given as a key in the answers /australian*/i or /Australian*/ should match not just "Australian" and "Australia" but "Austral" also? Which is not quite correct because...

  • All of my answers matched up absolutely precisely well apart from one answer negative out of 3 in the last advanced exercise - I replied australia* and it gave Australia, Australian, Australians - what is not quite correct cuz the word "Australians" is not supposed to be among the results according to the exercise task - right? So, as I figured out, there...

  • @GillianSmith Sorry to bother you with such questions but just to be sure - The phrase "He can't run" consists of:
    3 tokens: "He", "can't", "run" OR 4 tokens: "He", "ca", "n't", "run" - ?
    3 words: "He", "can't", "run" OR 4 words: "He", "can", "not", "run" - ?
    3 types: "He", "can't", "run" OR these 4 types: "He", "ca", "n't", "run" OR these 4 types: "He",...

  • @GillianSmith Thank you very much Gillian! It's a great honour to be a part of this great corpus community! Thank you for your help!)

  • @ArailymAbdigaliyeva Thank you Arailym, sure, in what corpora, please?

  • Hello everyone! Please help!! I need a corpus on ecology! May be a specialised one on ecology.. Does anyone know any of that kind, please?

  • Hello everyone! Please help!!! I need a corpus on ecology! May be a specialised corpus on ecology.. Does anyone know any of that kind?

  • @ShiamBeeharry Hi Shiam, can you imagine, to my surprise, the seminar about corpus data processing tools is going to take place at my university in December)) So, Python or R can be viewed as good tools to that point!)) I'll definitely look into Python / R. Thanks so much!!! )

  • @GillianSmith Thank you very much Gillian! Comparing this two tools I may say that AntConc installed on my mac at once in the wink of an eye to my surprise whereas with LancsBox I picked up fights to install and run it. So I've recently installed it somehow too. LancsBox offers much more variety of options of analysis than AntConc and also LancsBox is more...

  • Hello everyone! Could you provide me with some detailed information about types when we count frequency? So, we know about tokens, and we count them, I came across that we count types also, then how types correlate with tokens? What is a type in terms of corpus then, pls?

  • @DogusOksuz Hello Dogus! Thank you for your care! I've just done step by step. Still negative. I downloaded it, then unzipped it, but I can't say that it was expanded! Unzipped file "LancsBox.app.tar.gz" is 307,7 Mb, and what was unzipped is just one file "LancsBox" 307,4 Mb - less than unzipped, which is still a package with content. How many files are...

  • @ShiamBeeharry Thank you a lot) That would be a really good challenge! Thank you, my dear friend!)) I'll have a go!)

  • @ShiamBeeharry Hi Shiam, yes it really interesting!!! I've never heard of R before!) Exciting!) I have a go at using R and CRAN on my mac. Just getting involved. I wish I could provide smth that wold be useful for u!

  • @GillianSmith Thank you, Gillian! Where can I see and try those specific tools, pease? Or at what corpora?

  • @ShiamBeeharry Shiam! Exactly! That's what I'm really interesting in too. The ways how we query corpora to extract mutli-word units, or do they tag them, if yes, then how, or how to work with semantic fields to extract multi-word patterns or idiomatic patterns I should say.

  • Dear Mentors! Could you provide some information about the right way of unzipping LancsBox on Macs, pls? As I click on the downloaded file, then it unzips into one file which can run. But I suppose there should be more files unzipped than just one file? This one file runs as an application, but should there be many files unzipped? Right?

  • @EunJinChun Hi Eun, I think I've figured out what the problem is. It happens at the step of automatic unzipping. We just click on the downloaded file, then it unzips into one file which can run. But it is still the archive of files. It's not the right file to run. It should be unzipped further!!! Click the right button on it and chose second item in the pop-up...

  • @DogusOksuz Thank you very much! I'll try that corpus.

  • BTW, could you suggest the best corpus methods that will allows us to extract multi-word units?

  • @GillianSmith Thank you, now it works. I've got those slides. You're very helpful!

  • @SuminGuan Not at all, thanks to @GillianSmith Gillian.

  • @DogusOksuz Hi Doguz, Yeah, on my Mac I have something is called "Launchpad" and "Docks" where I put LancsBox app to. Is that what you call application folder? If yes, then I just drugged the app there after it had unzipped in downloads. So I downloaded it, then it unzipped in the downloading folder from which I drugged the file to Launchpad. Is that correct?

  • Hello! Could you suggest any English corpus with a good semantic annotation? Especially for searching phraseological items, pls? If there are any..

  • @GillianSmith Thank you very much, Gillian. looking forward to their reply. As this trouble really prevents me from moving ahead. Got stuck at the 1st MOOCweek (

  • @RaffaellaBottini Thank you. I'll have a go.

  • @GillianSmith Thank you very much Gillian. Maybe I should be registered there.. I don't know.. Anyway, thank you for this book, I'll google it.

  • @GillianSmith Yeah..( I downloaded it from this page http://corpora.lancs.ac.uk/lancsbox/download.php from Mac yellow square there, then as I click on the downloaded file it unzips automatically - just one file appears - is that correct? if yes then I confirm all those steps. Then I put it into Launchpad and I have it in the Dock also. Still the same.

  • Hi @RaffaellaBottini, thanks for your respond. I use MacOS Sierra 10.12.6, MacBook Pro 2017

  • @ShiamBeeharry Thank you Shiam, you are very helpful! It's an interesting book!

  • @RaffaellaBottini Thank you very much Raffaella. Unfortunately I haven't moved far enough( I got stuck facing trouble with loading a corpus into LancsBox. I managed to run it on my Mac, but the app refuses to display anything after downloading a corpus from the list apart from that corpus name in the corpus name bar and nothing below it. LancsBox says 'the...

  • @GillianSmith Thank you very much Gillian for providing more details. It's very interesting. Gillian, the link you gave doesn't work, is that correct address?

  • Thanks @ShiamBeeharry That's really interesting issue.

  • Hello @GillianSmith! Thank you. But the data is loading as the progress bar at the bottom on the right of the main window is getting gradually black until it's done. So it indicates that the downloading is complete. But nothing about the data is displaying apart from a corpus name in the corpus name bar. I tried to find any downloading restrictions but...

  • Hello! Would you provide some help, pls? I've managed to run LancsBox on my Mac, bt it refuses to display any sign of downloading a corpus (Brown or BNC64), it says downloading is complete and displays a corpus name in the corpus name bar bt nothing in the big white textarea below. It allows to press 'Import' button also with no reaction afterwords. Any advice?

  • Hello everyone! That's really good course! Does anyone know a corpus tagged for phraseological units? Such as phraseological antonyms or synonyms, or idioms? Or to build semantic query to such corpus in order to get semantic fields (with all the synonyms and antonyms in the result list) such as 'good-bad', 'fast-slow', 'birth-death', 'water-fire'. Or the most...

  • @GillianSmith Thank you!

  • @GillianSmith Thank you very much, Gillian!

  • @ShiamBeeharry Thank you very much, Shiam!

  • Thank you Shiam and Gillian for your clear answers! I think I've got that with your help. So, if I'd like to build my corpus semantically annotated for working with idioms, phraseological antonyms and synonyms, can I set as a token a phraseological unit such as an idiom or a phraseological antonym?

  • So, may I say that a token is a tagged/annotated unit of a corpus? A tagged minimal unit?

  • Hi Shiam, thank you very much for your help. Now I see. I wasn't sure I had understood 'token' definition right. Thanks a lot!

  • I'd like to put together a semantic reference corpus where samples are framed according to their semantics, their meaning so that such phraseological units as antonymic constructions may be easily queried and displayed.

  • Hello everyone! Sorry for joining you so late as I've just become aware of this course that I really need. So, I'll have a go at catching up on. My name is Alex and I'm doing a PhD at Moscow State Linguistic Uni, the Department of English Lexicology. My research interests are in the fields of corpus linguistics, CxG (construction grammar), joining CxG...

  • Is the total number of tokens in a corpus the size of that corpus??

  • What is the essential difference between 'balance' and 'representativeness'? Is it that 'balance' is aimed at representing the RANGE of language whereas 'representativeness' is aimed at the ACCURACY of language?

  • Hello! I'm very excited to join you and this course. Thank you all! It's really interesting! Sadly, I've just recently become aware of it. So, I'll try to catch it up on. In part 1.3, a corpus is defined as ‘a large set of language data which is made usable by computer’ (CASS_Glossary-new.pdf). I think it would be better to define a corpus as ‘the system of...