Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Coming up with questions

Coming up with data questions

We have seen the importance of asking questions in data science. Whether we have a particular question in mind or only a vague idea, questions need to be carefully formulated and broken down into smaller parts that can be answered using data.

If we have a clear idea of the question we are trying to ask (and answer) then we can take a top-down approach to break the question down into smaller, simpler and more precise component questions (high-level to low-level). This will enable us to immediately get started on assembling and analysing data in order to begin to answer it.

However, often we begin with an initial vague idea, observation, suspicion or hunch and a clear question eludes us. In this case, a bottom-up approach may help us develop a well-formulated question (low-level to high-level).

Low-level questions

Ultimately, questions are answered though analysis. This involves things like comparing, counting, averaging, classifying, clustering, predicting and fitting models. In order to begin such analysis you might ask low-level questions like:

  • Which of these explanations … is more likely?
  • Has there been an increase in … ?
  • How often does … occur in … time period?
  • What is the likely range of values for … ?
  • Is this group more likely to … ?
  • Is it possible to predict … ?
  • What groups do … naturally appear to fit into?

High-level questions

By contrast, high-level questions are often much more exploratory and open-ended. For example:

  • What factors influence what customers at … are likely to purchase?
  • How can … improve profitability?
  • Which footballers should be considered for purchasing in the upcoming transfer window?
  • Is this pandemic over yet?

Top-down and bottom-up approaches

The top-down approach often starts with exploratory data analysis – graphical plots and basic statistical summaries such as averages and counts – of existing data. In refining a complex or vague high-level question into smaller components, we may find that some aspects are discarded or the scale of the questions is constrained further.

The bottom-up approach starts with low-level questions. In thinking about what is needed to answer the low-level questions, a general theme might emerge from the collection of questions and data available or that could be collected.

In practice, formulating questions involves cycling around both approaches (and lots of discussion) until we converge.

Top tips

Refining an idea into a question is a bit like coping with writer’s block. Here are some top tips.

  • Try to write your idea down informally. By using words to describe the idea, you may end up with a more refined version of it.

  • Talk to colleagues. Often, it helps to verbalise a complex idea and get other people to ask you questions about it.

  • Start with attempting to answer a simpler question. Often, inspiration comes when you are working hands-on with data.

  • Once you think you have a question, give it a thorough critique. If you were to answer this precise question, would it really help you to answer your original idea?

  • Perform a rough cost-benefit analysis. Would it cost too much (time, resource or money) to obtain all the information you need in order to answer a question that may still be quite vague, or does the potential benefit outweigh the cost?

  • Who are you doing the analysis for? How will any insights be communicated to others and what form will they take?

Considerable effort might be required to refine an idea or observation into a question that is answerable using data. Often, this is more of an art than a science.

Your task

We have seen what kinds of simple questions might help form a more complex question and considered some other angles that may prompt thoughts on what we are trying to ask. So how do the professionals figure out their questions?
Read the article Why Amazon knows so much about you. Look especially at the quotes from Dr Andreas Weigend who was the first chief scientist at Amazon.
Considering the ideas from this step, pose your answers to the questions below in the comments area.
  • What idea do you think the Amazon data scientists started with?
  • What data did they have available?
  • What processes did they follow to try to formulate their questions?

References

Kelion, L. (2020). Why Amazon knows so much about you. BBC News. https://www.bbc.co.uk/news/extra/CLQYZENMBI/amazon-data

© Coventry University. CC BY-NC 4.0
This article is from the free online

Get ready for a Masters in Data Science and AI

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now