What is data science about?
Before thinking about data, it is much more important to ask the right questions.
When we think of data science we often think of the data. What data do we have? How is the data organised and where is it stored? What insights can we gain from this particular dataset?
Data science is the study of data of all different kinds, eg medical records, social media, sports science etc. It covers everything from cleaning data to deploying predictive models. Every day, huge volumes of data are collected and stored worldwide. To extract meaningful information and insights from this deluge of data, data scientists combine data from many sources and explore and analyse data through visualisation (graphical plots), applying algorithms and statistical modelling.
Florence Nightingale (1820-1910) is well known in the medical world as the pioneer of modern nursing. Perhaps less well known is her use of data visualisation in lobbying the government to improve medical care.
During the Crimean War (1853-1856), Florence Nightingale recorded data on the causes of death of soliders and presented her results on a polar area diagram (also know as a Nightingale rose, shown above).
Many aspects of data science lean heavily on mathematics, statistics, computer science and engineering. These are the technical aspects of storing, processing, visualising and analysing data. There is also much to data science that is more art than science: curiosity, creativity, perseverance, detective work, communication, building an argument, and a sense for fairness and accuracy.
However, when analysing data, it is often necessary to have some knowledge of the subject area (domain) the data comes from. Who better to explore and analyse data than those with the questions?
For example, in biological sciences, data is collected at all levels of biological systems, from genomes to populations. What are the interesting or important questions, at each of these levels, that data might be able to shed some light on?
Although data science can be quite technical, it is most important that a data science investigation begins with a carefully-considered set of questions in mind. Interesting and important questions generally come from knowledge of the domain that the data is collected from.
Consider the following two quotes from John Tukey (1915-2000), the author of the book Exploratory Data Analysis (1977).
The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
Far better an approximate answer to the right question which is often vague, than an exact answer to the wrong question, which can always be made precise.
How do these apply to a domain that you have some expertise in?
Open University (n.d.). The joy of stats: The lady with a data visualisation. Mathscareers.org. https://www.mathscareers.org.uk/video/joy-stats-lady-data-visualisation
Tukey, J. (1977). Exploratory data analysis. Pearson.
© Coventry University. CC BY-NC 4.0