Coming up with questions
We have seen the importance of asking questions in data science. Whether we have a particular question in mind or only a vague idea, questions need to be carefully formulated and broken down into smaller parts that can be answered using data.
If we have a clear idea of the question we are trying to ask (and answer) then we can take a top-down approach to break the question down into smaller, simpler and more precise component questions (high-level to low-level). This will enable us to immediately get started on assembling and analysing data in order to begin to answer it.
However, often we begin with an initial vague idea, observation, suspicion or hunch and a clear question eludes us. In this case, a bottom-up approach may help us develop a well-formulated question (low-level to high-level).
Ultimately, questions are answered though analysis. This involves things like comparing, counting, averaging, classifying, clustering, predicting and fitting models. In order to begin such analysis you might ask low-level questions like:
- Which of these explanations … is more likely?
- Has there been an increase in … ?
- How often does … occur in … time period?
- What is the likely range of values for … ?
- Is this group more likely to … ?
- Is it possible to predict … ?
- What groups do … naturally appear to fit into?
By contrast, high-level questions are often much more exploratory and open-ended. For example:
- What factors influence what customers at … are likely to purchase?
- How can … improve profitability?
- Which footballers should be considered for purchasing in the upcoming transfer window?
- Is this pandemic over yet?
Top-down and bottom-up approaches
The top-down approach often starts with exploratory data analysis – graphical plots and basic statistical summaries such as averages and counts – of existing data. In refining a complex or vague high-level question into smaller components, we may find that some aspects are discarded or the scale of the questions is constrained further.
The bottom-up approach starts with low-level questions. In thinking about what is needed to answer the low-level questions, a general theme might emerge from the collection of questions and data available or that could be collected.
In practice, formulating questions involves cycling around both approaches (and lots of discussion) until we converge.
Refining an idea into a question is a bit like coping with writer’s block. Here are some top tips.
Try to write your idea down informally. By using words to describe the idea, you may end up with a more refined version of it.
Talk to colleagues. Often, it helps to verbalise a complex idea and get other people to ask you questions about it.
Start with attempting to answer a simpler question. Often, inspiration comes when you are working hands-on with data.
Once you think you have a question, give it a thorough critique. If you were to answer this precise question, would it really help you to answer your original idea?
Perform a rough cost-benefit analysis. Would it cost too much (time, resource or money) to obtain all the information you need in order to answer a question that may still be quite vague, or does the potential benefit outweigh the cost?
Who are you doing the analysis for? How will any insights be communicated to others and what form will they take?
Considerable effort might be required to refine an idea or observation into a question that is answerable using data. Often, this is more of an art than a science.
We have seen what kinds of simple questions might help form a more complex question and considered some other angles that may prompt thoughts on what we are trying to ask. So how do the professionals figure out their questions?
Read the article Why Amazon knows so much about you. Look especially at the quotes from Dr Andreas Weigend who was the first chief scientist at Amazon.
Considering the ideas from this step, pose your answers to the questions below in the comments area.
- What idea do you think the Amazon data scientists started with?
- What data did they have available?
- What processes did they follow to try to formulate their questions?
Kelion, L. (2020). Why Amazon knows so much about you. BBC News. https://www.bbc.co.uk/news/extra/CLQYZENMBI/amazon-data
© Coventry University. CC BY-NC 4.0