Skip main navigation

Working with geolocation data

In this article you will learn how to explore your data set and improve its quality using geolocation data stored in tweets.

We now have a dataset containing information about items being listed at various locations in the United Kingdom by users of the OLIO app.

We have learned a number of techniques and tools for extracting and visualising the quantity of products and items being listed by users, in this step we will combine everything we have learnt to explore the dataset according to the geolocation data stored with each tweet.

If you explore the data in more depth you will observe that the place type ‘admin’ appears for some entries. Despite the presence of hashtags relating to the city name in the tweet text, for example, #Falkirk, #Didcot, and #Enfield, this information is missing from these tweets, see the table below.

"Twitter data set with missing data in the form of the value ‘admin’ listed in the ‘place_type’ column rather than ‘city’, or ‘town’" Click image to expand.

In this step, we will improve the quality of this dataset by replacing these rows so that we have town or city level information rather than the default value of ‘admin’ (administrative district).

Geolocation refers to the technology and techniques used to record or identify the physical, or geographic, location of an individual, or computing device. Most geolocation services use network IP addresses or internal GPS devices (small integrated chips), which determine or record the current location of the device.

These days the majority of people with a mobile phone will have some form of method to record their location, either via their internet or mobile connection, or a dedicated GPS integrated into their mobile phone. On top of this computer and mobile applications send this information together with our data to the services we use. This often provides some benefit to the user, such as providing directions to a location based on your current position, or as we discussed at the beginning or the course, recommendations on music events, or places to eat nearby.

In the previous step, we found that our data contained unwanted values. We will correct this issue by extracting the missing values using a Geolocation API that will identify a location given the latitude and longitude coordinates and return a full postal address. The approach we apply is a method known as reverse geocoding. Given a set of coordinates, reverse geo-coding extracts a text string with address information, which is exactly the information we need to fill in our missing data.

For example, given the latitude and longitude coordinates (51.52194, -0.13032), the API returns a JSON object, similar to what we have seen before, with metadata about these coordinates, including address information. Have a look at the output of the API.

As we can see, the address attribute stores the address broken down into various fields, such as ‘road’, ‘suburb’, and ‘city’, which means we have all the information we need to address the missing or incorrect values we identified earlier.

Your Task

Locate yourself Juypter Notebook
You will now apply the technique of reverse geolocation to obtain the address information according to your current geo-coordinates.
Your task is to run the Jupyter Notebook example, to complete the task.
The task involves amending the geolocation coordinates in the code and run the notebook. This task should take no more than 10 minutes to complete.
After completing the task, you will have the address data for your current location according to your latitude and longitude. We will use this approach to visualise the geolocation data we have for each tweet.
© Coventry University. CC BY-NC 4.0
This article is from the free online

Applied Data Science

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now