Talking to the black box
Sometimes it is possible to carry out an analysis of data and make decisions based on a quick calculation with pencil and paper. The rest of the time we need to instruct a computer to help us with this work.
For complex questions or larger datasets, it is obvious that we need to get the computer to carry out the computational grunt work. Even for simple questions or small datasets, using a computer is desirable because it is a highly accurate calculator and it may find insights that a human might miss.
Therefore it is essential to learn how to use a computer to do data analysis, much as a painter learns to use brushes and colours or a carpenter learns to use woodworking tools.
What is a black box?
When using a mobile phone, most people are unlikely to wonder HOW it works, and instead focus on WHAT it can do for them.
We refer to a system or device as a black box when we don’t need to know about its internal workings (they are hidden in the dark) and can focus on inputs and outputs. When thinking about using a computer to carry out data analysis, we can simply focus on WHAT it needs to do and some way to interact with it, rather than HOW exactly it works its magic.
There is a great quote from Arthur C. Clarke (1917-2008) who was a famous science fiction writer:
Any sufficiently advanced technology is indistinguishable from magic.
What tasks does the black box need to carry out?
Considering the science side of data science, any analysis we perform must follow a repeatable sequence of steps, so that any results we produce can be reproduced by other people.
We need to break steps down into simple low-level instructions that a computer can follow. This is called computer programming (or coding).
Specialist data science software or libraries (often available for free) offer higher-level tools. Each tool generally processes a dataset in some way or gives some output that is then interpreted by a human.
These tools are generally applied in a particular sequence in order to carry out analysis. For example, one tool could read data from a database, another might extract rows and columns according to some criteria, another might summarise (in counts and averages) different categories in a dataset, and another might build a particular type of graphical plot.
How to we communicate with the black box?
We ‘talk’ to the black box by writing computer code, consisting of a sequence of high-level operations. Each of these operations has been skilfully designed and implemented by data science experts to carry out these operations accurately and efficiently.
Computers perform a significant role in data science because their operations are fast, repeatable and accurate. However, good data analysis is a team effort between the human and the computer; they have complementary skills.
© Coventry University. CC BY-NC 4.0