Skip main navigation

Hypothesising in data science

What is hypothesising?
’A hypothesis may be simply defined as a guess. A scientific hypothesis is an intelligent guess.’ – Isaac Asimov [1]

‘Hypothesis’ originates from the Greek words ‘hupo’ (under) and ‘thesis’ (placing), which means an idea generated from limited evidence. It serves as the starting point for further investigation. Therefore, a hypothesis can be defined as a statement that is an informed guess but cannot be believed until proven true. When proven true, a hypothesis becomes a fact.

A hypothesis can be treated as a method or argument that explains some observed phenomenon. In a scientific setting, a hypothesis is meant to be proven through experimentation. In data science, a hypothesis is proven through data analytics. This means, once the hypothesis is defined, you can collect data to determine whether it provides enough evidence to support the hypothesis.

Hypothesising in data science

Hypothesising forms an integral part of data science business projects. In fact, it is the step in which data scientists generate the questions that are to be answered for achieving enhanced business performance. A well-formed and tested hypothesis determines the focus of the business problem. It decides the direction of the data science project by informing crucial decisions such as what data needs to be collected from what sources, what statistical process must be employed for the analysis, and so on. For instance, by hypothesising, data scientists better understand what variables should and should not be considered during the data analytics process.

Here are a few key reasons why hypothesising is key in data science.

Hypothesising helps data scientists:

  • understand the business problem by digging deeper into the assumptions on the various factors affecting the target variable
  • gain a deeper idea of what the significant factors are that are responsible for solving the problem
  • make informed data collection from various sources that are fundamental in converting the business problem into a data science project to approach the problem in a structured manner.

Hypothesis generation versus hypothesis testing

There are two parts to hypothesising in data science – hypothesis generation and hypothesis testing. The most successful data science projects start with building a good hypothesis using the sample or available data set. A well-thought-out hypothesis establishes the course and plan for a data science project. After that, the hypothesis is tested with the complete data and set as null or alternative. You will learn about these concepts in detail in the next few sections of this week.

Hypothesis generation is an educated guess of the various factors affecting the business problem to be resolved. Or, put another way, you are making wise assumptions about how certain factors would impact the target variable. In the process that follows, you try to conclude the relationship between the variables/guesses that are true/false, which is called hypothesis testing.


  1. Fredericks AD, Asimov I. The Complete Science Fair Handbook. For Teachers and Parents of Students in Grades 4-8. Glenview: Good Year Books, Inc.; 1990. 98 p.
This article is from the free online

Introduction to Data Science for Business

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education