Skip main navigation

Main steps in statistical analysis

What are the main steps in statistical analysis?

Statistical methods help us examine data and investigate questions in an objective manner. This involves three main components:

  1. Data acquisition: Stating a statistical question of interest and obtaining data that will address it.
  2. Data description: Presenting, summarising and visualising the data to determine its features for further analysis.
  3. Statistical inference: Analysing the data in an objective, model-based fashion, interpreting the results of such analysis and making conclusions for answering the statistical question.

1. Data acquisition 

Data acquisition involves collecting and storing available data and/or planning data collection that will shed light on the statistical question of interest.

Sometimes, statistical analysis begins with a given set of data. For instance, data are regularly collected and publicised on the environment (precipitation, temperature records, storm occurrences), economics (unemployment rates, gross domestic product, rate of inflation), health (mortality data, prevalence of different diseases, rates of infection), and so on.

In other situations, data is not yet available, and statistical theory can be used to design an appropriate experiment to generate data. The chosen experiment should depend on the use that you want to make of the data. For instance, how could you design and conduct an experiment to determine reliably whether regular large doses of vitamin C are beneficial to people’s health? In marketing, how could you select the people to survey so your data would provide good predictions about future sales?

2. Data description

Data description means exploring and summarising patterns in the data.

Files of raw data are often huge, and such raw data are not easy to assess. It is more informative to use a few aggregated numbers or a graph to summarise the data, such as an average amount of TV watched or a graph displaying how the number of hours of TV watched per day relates to the number of hours per week exercising.

This part of statistics, concerned with the description and visualisation of data, is called descriptive statistics.

Making sense of the data, including its visual representation and diagnostics, is an important part of statistical ‘good practice’, enhanced (but not replaced) by quantitative statistical tools, such as R and RStudio, that you learn in this course.

3. Statistical inference

Statistical inference means carrying out data analysis by taking into account its variability and uncertainty.

An objective and verifiable analysis is made possible by using statistical models based on the concept of probability and distributions of random variables. Inference includes interpreting the results concerning the posed statistical question, and culminates in making decisions or predictions about the general population, not merely about the data considered in the study.

This is the essence of learning from data.

This article is from the free online

Statistical Methods

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now