Learn more about this course.

Finding good examples automatically

An account of the GDEX tool for finding good examples in a corpus, explaining how this works and how successful it is.

The previous exercise will have given you an idea of what makes a good dictionary example.

Evaluating sentences like the ones in Step 6.7 helps us get a clearer understanding of the factors that go into making good (and bad) examples. Similar thinking informed the development of a software tool called ‘GDEX’ (standing for ‘Good Dictionary EXamples’) which is designed to automatically find good examples in a corpus.

The developers started from a set of characteristics which are typical of good examples, including:

They shouldn’t be too long.
They should be easy to understand, avoiding rare or technical vocabulary and distracting names of people or organisations.
They should illustrate the most typical ways a word is used, such as its normal grammar patterns and collocates (the words it most often occurs with).
They should be as self-contained as possible – for example, by avoiding pronouns which refer back to something in a previous sentence.

Want to keep
learning?

This content is taken from
Coventry University online course,

Understanding English Dictionaries

View Course

These conditions were then translated into specific, measurable features, such as sentence length (not too short, not too long); frequencies of other words in the sentence (to avoid anything too rare); number of pronouns in the sentence (these can be confusing if you don’t know what they refer to); and the appearance of common collocates (using data in the Word Sketches). Each feature was given a ‘weighting’. The system then went through sentences that included the search word and gave each sentence a score based on these criteria. The ones with the best scores were then ‘promoted’, so that in a concordance for ‘demonstrate’, for example, the ‘best’ examples appear at the top. This gives the lexicographer a candidate set of potential dictionary examples to choose from.

The GDEX algorithm has been used in a number of dictionary projects, with considerable success. The system doesn’t always get it right, and in a given set of 10 best examples, there will usually be two or three which are definitely not suitable. But it is still being improved, and it is already a more cost-effective way of finding examples than simply asking lexicographers to scan dozens or hundreds of corpus sentences.

Want to keep learning?

This content is taken from Coventry University online course

Understanding English Dictionaries

View Course

See other articles from this course

This article is from the free online

Understanding English Dictionaries

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Finding good examples automatically

Want to keep
learning?

Understanding English Dictionaries

Further reading

Want to keep learning?

Understanding English Dictionaries

Understanding English Dictionaries

Understanding English Dictionaries

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Finding good examples automatically

Want to keep learning?

Understanding English Dictionaries

Further reading

Want to keep learning?

Understanding English Dictionaries

Share this

Understanding English Dictionaries

Understanding English Dictionaries

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?