3.2

# Parallel Computing

We have seen over the first two weeks that almost all modern computers are parallel computers, consisting of many individual CPU-cores that can be connected together in two different ways: in either a shared-memory or a distributed-memory fashion. A supercomputer will contain tens of thousands of CPU-cores, all operating simultaneously.

A single CPU-core acts as a serial computer, running a single computer program at any one time. The Oxford English Dictionary defines serial computing as ‘the performance of operations … in a set order one at a time’. To take advantage of a parallel computer, we need to perform many operations at the same time so that we can make use of many CPU-cores. Parallel computing is defined as ‘Involving the concurrent or simultaneous performance of certain operations’.

It is quite clear that a supercomputer has the potential for doing calculations very quickly. However, it may not immediately be obvious how to take advantage of this potential power for any particular problem. This week, we look at ways to solve problems that enable us to take advantage of parallel computers. That requires a problem, calculation or serial computer program to be parallelised.

The process of parallelising a calculation has a number of important steps:

$\$ - splitting the calculation into a number of smaller tasks that can be done independently (and therefore performed simultaneously by different CPU-cores), which is also called decomposing the calculation;

$\$ - identifying when and where tasks need to be coordinated (meaning that the CPU-cores must talk to each other);

$\$ - implementing these two operations using standard approaches;

$\$ - executing the parallel program on a parallel computer.

The first two steps typically depend on the problem you are trying to solve, and you do not have to consider the architecture of the particular parallel computer you are eventually going to run on. However, we will see that the last two steps are quite different depending on whether you are targetting a shared or distributed-memory computer. In these cases we use different programming models called the shared-variables model or the message-passing model; these two models are executed in very different ways.

Once you have reached step 3, there are many existing software packages and tools to help you automate the process. However, it may be surprising to you that the first two steps are still done by hand, requiring someone to sit down and think things through using pencil-and-paper, maybe trying a number of ideas to see which works best.

Parallel computing has been around for several decades so there is a wealth of experience to draw upon, and the best parallelisation strategies have been identified for a wide range of standard problems. Despite this, it is not currently possible to completely automate these first two steps for anything but the simplest problems – perhaps a disappointing state of affairs given the fact that almost every phone, tablet or laptop is now a parallel computer, but good news if you are employed as a parallel programmer!

What aspects of a problem do you think need taken into account when thinking about how to decompose it into parallel tasks? Discuss your thoughts in the comments!