Skip main navigation

Case study: Parallel sum

In this article we walk through an example parallel algorithm that calculates the total sum of all elements in an array in parallel.
© CC-BY-NC-SA 4.0 by CSC - IT Center for Science Ltd.

How can a problem be split into smaller pieces that can be executed in parallel? How to do it efficiently? These are the key questions when designing a parallel algorithm.

Designing a good strategy for work distribution is usually the most essential part of writing a parallel program. Common approaches utilise either data parallelism or task parallelism. In data parallelism one distributes a (large) dataset to multiple processors and then the processors operate on separate pieces of the data in parallel. In task parallelism one breaks down the algorithm into smaller tasks that are then distributed and executed in parallel.

Let us now look into an example implementation of a parallel algorithm for calculating the sum of all values in an array.

Initial state

Assume we have an array A that contains a large number of floating point numbers. The values have been read from a file by the first MPI task (rank 0).

Initial state of parallel sum

Goal

Calculate the total sum of all elements in array A in parallel.

Parallel algorithm

Our parallel algorithm consist of four main steps: 1) distribute the data, 2) compute local sums, 3) gather the partial results, and 4) compute the total sum. Steps 1 and 3 can be further broken down into sub-steps to better illustrate the MPI communication needed.

1. Scatter the data
1.1. receive operation for scatter
1.2. send operation for scatter
2. Compute partial sums in parallel
3. Gather the partial sums
3.1. receive operation for gather
3.2. send operation for gather
4. Compute the total sum

Step 1.1: Receive operation for scatter

Step 1.2: Send operation for scatter

Step 2: Compute partial sums in parallel

Step 3.1: Receive operation for gather

Step 3.2: Send operation for gather

Step 4: Compute the total sum

In the next activities you will learn how to carry out the above operations in practice.

© CC-BY-NC-SA 4.0 by CSC - IT Center for Science Ltd.
This article is from the free online

Python in High Performance Computing

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education