Skip main navigation

Big data: Enhanced Shakespearean EEBO-TCP

Watch Jonathan Culpeper elaborating on big data - the Enhanced Shakespearean EEBO-TCP.

The problem with using the whole of EEBO for Shakespeare-related studies is that it is too broad. For example, for most purposes, we don’t need language from the late fifteenth century.

As part of the Encyclopedia of Shakespeare’s Language Project, we set about creating a specially tailored subset of EEBO-TCP enhanced for the study of Shakespeare. In designing this corpus, tricky decisions need to be made about what to include.

For example, what time period should it cover? From Shakespeare’s birth to his death? But then he wasn’t even producing plays in his early years. One of the particular enhancements we made to this corpus was to label every text as belonging to a particular genre. In fact, it was this that enabled us to diagnose the word “bastard” as a rather informational, technical term in early modern English. We could see the genres in which it tended to appear. Designing a genre classification system and actually applying it to a huge number of texts in a corpus presents its own challenges, of course.

How do you think the notion of genre can be used to shed light on Shakespeare’s language? Put your thoughts in the comments.

This article is from the free online

Shakespeare's Language: Revealing Meanings and Exploring Myths

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now