How is data analysed and interpreted?
In this step, we’ll begin to look at how you might interpret and analyse data.
Data analysis starts with framing the question you’re trying to answer. For instance, Aisha might want to understand why she’s running out of items, or which days are busiest for her.
When you know your question, you collect data to answer it. It could come from business systems like tills or stock management systems, social media analytics, web sites or customer feedback forms. Some data might even be recorded by hand and need to be entered into a computer manually.
This collected data then needs to be organised for processing. One of the easiest ways of organising data is to turn it into tabular data: data held in regular rows and columns. Spreadsheets and traditional databases are systems for storing tabular data and working with the relationships between data in those tables.
Once your data is organised, you may find that you can see missing, corrupt or duplicate data. It might have been combined from different sources, have inconsistent formats, or lots of gaps or duplicates. Your data needs ‘cleansing’ to stop these problems affecting your analysis.
When your data is cleaned up you can start to explore it. If you’re given a set of data to work with - say, a spreadsheet of business data, your first task is to get a feel for what information the data might contain. Questions can help you uncover the structure of the data:
- How much data is there? What can you tell from the way the data is organised?
- What metadata is attached to it? What do the headings tell you?
- What types of data are you looking at: text, numbers or financial data, dates and times? What time period does it cover?
- What questions can it answer: How many… ? How often…? How much…?
- Are there any blind spots? What questions can’t you answer?
For numerical data, simple summaries and descriptive analysis will help you get a feel for the data. You could use one or a combination of these analysis techniques:
You can check minimum and maximum values using formulae in your spreadsheet software, or calculate averages to understand what the most likely values are.
Calculating a mean - a sum of all of the values, divided by the count of the values - gives you a single figure that’s indicative of a field, but may be skewed if there are a few exceptionally large values. Other kinds of average tell you different things about the way the data is distributed - does it cluster around a certain set of values, or vary widely within a large range?
Mode is the value that appears most frequently, and a Median is the midpoint number, where 50% of the values will be more than and 50% less than that value. Expressing data as percentages can also help summarise and compare data sets of different sample sizes.
Grouping data into subsets makes it easier to spot patterns or compare data between different sources. For instance, demographic information about people is often grouped by standardised age ranges like 25-34. Spreadsheet software offers features like grouping, pivot tables and automated analysis tools like ‘Explore’ (Google Sheets) or ‘Ideas’ (Excel) which help you simplify and compare different aspects of your data.
If you have a time series, consider if it’s better to look at data by day, week, month, quarter or year - how quickly do values change, and what kind of patterns do you see at different levels?
Visualising data using charts or graphs can make even very complicated data understandable at a glance, and make it easier to compare different aspects of the data.
Data is only as good or useful as your interpretation. The final stage of analysis is to consider why you’re seeing the features in the data. What might cause the patterns?
It is easy to see correlations in data - when two values influence each other or change with the same pattern. However, this might not mean that one is causing the other. Consider ice-cream and sunglasses sales - they both change with the same patterns over time. It’s not that eating ice-cream makes you want to buy sunglasses, but that both tend to sell more during sunny weather. Always consider if there’s a hidden third variable that’s influencing the patterns you see.
There are more advanced techniques that can help you understand data sets. However, you are likely to find that these simple analysis techniques will answer many of your questions.
Take your learning further
University of Leeds has produced a series of videos that give an overview of basic statistical concepts including averages, data descriptives and how to draw and interpret bar graphs and pie charts.
Data Analysis is a huge topic. If you’d like to understand it more, there are other FutureLearn courses you can take, including Big Data Analytics: Opportunities, Challenges and the Future and Data to Insight: An Introduction to Data Analysis and Visualisation.
If you are interested in working with pivot tables, there are guidelines available for working in both Excel and in Google Sheets. Note, you will need to use a laptop or desktop, as pivot tables are not available for mobile devices.
Have your say:
Data analysis is a huge topic, and can be highly technical. Are there any terms that you’re unclear about, or concepts you’d like to understand better from this step?
Use the Comments to ask other learners for help or to discuss your thoughts.