Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Hypothesising in data science

What is hypothesising?
’A hypothesis may be simply defined as a guess. A scientific hypothesis is an intelligent guess.’ – Isaac Asimov [1]

‘Hypothesis’ originates from the Greek words ‘hupo’ (under) and ‘thesis’ (placing), which means an idea generated from limited evidence. It serves as the starting point for further investigation. Therefore, a hypothesis can be defined as a statement that is an informed guess but cannot be believed until proven true. When proven true, a hypothesis becomes a fact.

A hypothesis can be treated as a method or argument that explains some observed phenomenon. In a scientific setting, a hypothesis is meant to be proven through experimentation. In data science, a hypothesis is proven through data analytics. This means, once the hypothesis is defined, you can collect data to determine whether it provides enough evidence to support the hypothesis.

Hypothesising in data science

Hypothesising forms an integral part of data science business projects. In fact, it is the step in which data scientists generate the questions that are to be answered for achieving enhanced business performance. A well-formed and tested hypothesis determines the focus of the business problem. It decides the direction of the data science project by informing crucial decisions such as what data needs to be collected from what sources, what statistical process must be employed for the analysis, and so on. For instance, by hypothesising, data scientists better understand what variables should and should not be considered during the data analytics process.

Here are a few key reasons why hypothesising is key in data science.

Hypothesising helps data scientists:

  • understand the business problem by digging deeper into the assumptions on the various factors affecting the target variable
  • gain a deeper idea of what the significant factors are that are responsible for solving the problem
  • make informed data collection from various sources that are fundamental in converting the business problem into a data science project to approach the problem in a structured manner.

Hypothesis generation versus hypothesis testing

There are two parts to hypothesising in data science – hypothesis generation and hypothesis testing. The most successful data science projects start with building a good hypothesis using the sample or available data set. A well-thought-out hypothesis establishes the course and plan for a data science project. After that, the hypothesis is tested with the complete data and set as null or alternative. You will learn about these concepts in detail in the next few sections of this week.

Hypothesis generation is an educated guess of the various factors affecting the business problem to be resolved. Or, put another way, you are making wise assumptions about how certain factors would impact the target variable. In the process that follows, you try to conclude the relationship between the variables/guesses that are true/false, which is called hypothesis testing.

References

  1. Fredericks AD, Asimov I. The Complete Science Fair Handbook. For Teachers and Parents of Students in Grades 4-8. Glenview: Good Year Books, Inc.; 1990. 98 p.
This article is from the free online

Introduction to Data Science for Business

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now