Working with Data
In this step, you will undertake a simple task to de-personalise (anonymise), analyse and model some data.
Download/save to your desktop the dataset called “MOOC anonymisation” from the USMART Playground and undertake the following tasks (15 mins):
- De-personalise/anonymise the dataset by masking the variable ‘name’ and k-anonymising any fields involving the variable ‘age’.
- Do you think anything needs to be done with outliers (data that does not seem to fit the general pattern of the dataset)?
- Analyse the data to look for a correlation (relationship) between the variables ‘salary’ and ‘age’
You may also now wish to go to the data.gov.uk website and search for datasets with ‘time’ and ‘date’ fields.
Look at how the ‘time’ and ‘dates’ are recorded in the dataset.
Discuss your findings in the comments section below.
- How was each recorded?
- Would this make analysis easier or more difficult?
© University of Strathclyde