Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

Twitter as a Data Source

Twitter is one of the easiest social media to mine for data. Dr. Matei will show you how easy it is to pull valuable information for your own account.
In this course, Twitter is going to be our playground. We’re going to use Twitter as the central example. But the techniques that we will show you are applicable across any social media platform. You can see here that I have Twitter pulled up on a web browser. I’m about to show you how to access Twitter data by downloading the data from an account I manage, USAtalk. What is really useful is that Tweets will become rows and columns, ready for analysis. In other words, status updates which are mostly words and pictures have turned into statements, with specific characteristics. Authorship, content, performance time, and so on. So let us take a look under the hood.
We start with the icon for the profile of the account. And we go to the Analytics section. Here you have a lot of information ready to consume, some of it very aesthetically pleasing. But at the same time, it is delivered one bit of information at a time. You do not have here yet what I just described, the tabular format. For that, you need to choose one of the months, such as April, in this case. You click on View All Activity, again, you are gonna get a chart. But most important, you are gonna get this link, which will download onto your local machine the entire record of the Twitter activity for that month.
If you double click on the file, you’ll immediately convert it into a spreadsheet. Now this spreadsheet is very important because it has the Twitter data organized in a readily analyzable format. As you can see, we have the rows that I described before. Each row is a tweet, and for each row we have a number of columns. These columns capture the characteristics of the tweets. Now these characteristics refer to the behaviors associated with the tweets, the behaviors of the public. We have the number of impressions from 118 to 24. These are the pairs of eyeballs that looked at each tweet.
But most important we have a column such as engagements, which count how many times people have liked, retweeted, favorited, or emailed a tweet. You can collect dozens of characteristics for each tweet. We don’t have a lot of activity on this beginner account. But if you have a very active account, you will have a lot of information to look at. Not only that, but you always have access to the tweets itself. It’s as simple as clicking on the hyperlink for each tweet. And then you will get the tweet as it was tweeted with all the information attached to it.

Twitter is one of the easiest social media platforms to mine for data, and it is an excellent resource used every day by social scientists for research. We’re going to use Twitter as the central example in this course, but the techniques that we will show you are applicable across any social media platform. In this video tutorial, you’ll see how easy, interesting and productive this process can be for you to implement.

What does Twitter data represent?

The first and most well-known characteristic is its brevity and simplicity. The footprints left by its users while interacting with each other can be collected and analysed by downloading data into a spreadsheet, then tweets can be turned into rows and columns, ready for analysis. In other words, status updates, which are mostly words and pictures, can be turned into “statements” with specific characteristics: authorship, content, performance, time, and so on. As an example of its richness for data mining, the metadata in each tweet contains not only the text but also forty-five different variables, such as number of followers, favourites, language, geographic location, etc.

What are the main limitations of gathering analytics data from Twitter?

  • Data retrieval bias: Data gathering from online social networks like Twitter requires a great deal of computational knowledge and computer programming.

  • Representation bias: Twitter usage differs significantly across the world. These differences in usage within different countries make it challenging to take Twitter as a representative sample of the general population.

  • Language bias: Researchers have found that Twitter hashtags are focused on only a few languages. This might misrepresent data as the full data of an event cannot be captured in all languages.

Read the article attached below for more insights into the benefits and limitations for analysing Twitter data.

This article is from the free online

Digital Media Analytics: Introduction

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education