£199.99 £139.99 for one year of Unlimited learning. Offer ends on 14 November 2022 at 23:59 (UTC). T&Cs apply

Fast communication of large arrays

# Fast communication of large arrays

© CC-BY-NC-SA 4.0 by CSC - IT Center for Science Ltd.

MPI for Python offers very convenient and flexible routines for sending and receiving general Python objects. Unfortunately, this flexibility comes with a cost in performance.

In practice, what happens under the hood is that Python objects are converted to byte streams (pickled) when sending and back to Python objects (unpickled) when receiving. These conversions may add a serious overhead to communication.

Good news is that MPI for Python offers alternative routines for sending and receiving contiguous memory buffers (such as NumPy arrays) with very little overhead. To distinguish between the two types of routines, the names of the flexible, all-purpose routines are all in lower case whereas the names of the fast, contiguous memory specific routines always start with an upper case letter.

When using one of the upper case methods, the underlying MPI implementation can simply copy memory blocks without any conversions. If the amount of data to be communicated is large, this will give an enormous performance improvement. It is therefore always advisable to use only the upper case methods (apart from maybe some simple initialisation).

Sending and receiving a NumPy array efficiently is very straightforward. Since MPI for Python knows NumPy arrays, it can automatically take care of most of the details.

To send, one basically just needs to use the upper case method Send()
giving the NumPy array and destination rank as arguments:

Send(data, dest)

To receive, one needs to first prepare a NumPy array to receive the data to
and then use the upper case method Recv() giving the array and source rank
as arguments:

data = numpy.empty(shape, dtype)Recv(data, source)

Note the difference between the upper/lower case methods on the receive side!
Upper case Recv() does not return the data, but instead copies it to an
existing array.

Example:

from mpi4py import MPIimport numpycomm = MPI.COMM_WORLDrank = comm.Get_rank()data = numpy.empty(100, dtype=float)if rank == 0: data[:] = numpy.arange(100, dtype=float) comm.Send(data, dest=1)elif rank == 1: comm.Recv(data, source=0)

MPI supports also of sending one message and receiving another with a single
call. This reduces the risk of deadlocks in many common situations.

For example, when doing a simple message exchange (i.e. two processes send
and receive a message to/from each other) one needs to be careful to have one
process first receive and the other send and then vice versa to avoid a
deadlock. With a combined send and receive, both processes can simply call a
single MPI call and be done with it.

The combined routine Sendrecv() is similar to the separate Send() and
Recv() routines. It basically just combines the two and the arguments
reflect this:

buffer = numpy.empty(data.shape, dtype=data.dtype)Sendrecv(data, dest=tgt, recvbuf=buffer, source=src)

The destination (tgt) and source (src) ranks can be the same or they can
be different. If no destination or source is desired (e.g. on boundaries) one
can use MPI.PROC_NULL to indicate no communication. Just like with the upper
case receive, the receive buffer (buffer) needs to exist before the call and
be sufficiently large to hold all the data to be received.

data = numpy.arange(10, dtype=float) * (rank + 1)buffer = numpy.empty(10, float)if rank == 0: tgt, src = 1, 1elif rank == 1: tgt, src = 0, 0comm.Sendrecv(data, dest=tgt, recvbuf=buffer, source=src)

## Communicate any contiguous array

### MPI datatypes

MPI has a number of predefined datatypes to represent data, e.g. MPI.INT for
an integer and MPI.DOUBLE for a floating point number (in Python float is
double precision). If needed, one can also define custom datatypes, which can
be handy e.g. to use non-contiguous data buffers.

MPI has e.g. the following pre-defined datatypes available:

• MPI.INT for an integer (int)
• MPI.DOUBLE for a floating point number (float)
• MPI.CHAR for a single character (str)
• MPI.COMPLEX for a complex number (complex)

### Manual definition of a memory buffer

When communicating a Python object (lower case methods) or a NumPy array
(upper case methods), the datatype does not need to be specified. Objects are
serialised into byte streams and the datatype of a NumPy array is
automatically detected. If you have another type of contiguous array (i.e. an
object referring to a contiguous memory space containing multiple elements of
a single datatype), you have to do it manually instead.

The data buffer argument for the upper case methods is actually expected to
yield three pieces of information:

• location in memory
• number of elements
• datatype of the elements

These can be automatically obtained from a NumPy array, but now we need to
define them manually as a list of three items: [buffer, count, datatype].

For example, assuming data contains an array of 100 integers, we could send
it like this:

comm.Send([data, 100, MPI.INT], dest=tgt)

If one is working with simple contiguous arrays, the number of elements in an
array can also be inferred from the byte size of the buffer (data) and the
byte size of the datatype. Thus, for such cases one can optionally also use a
shorter syntax: [buffer, datatype]. Since the number of elements is usually
trivially known, it is a good idea to simply stick with the 3-element syntax.

An example of sending and receiving a manually defined memory buffer (using a
NumPy array for the buffer just for simplicity):

from mpi4py import MPIimport numpycomm = MPI.COMM_WORLDrank = comm.Get_rank()data = numpy.empty(100, dtype=float)if rank == 0: data[:] = numpy.arange(100, dtype=float) comm.Send([data, 100, MPI.DOUBLE], dest=1)elif rank == 1: comm.Recv([data, 100, MPI.DOUBLE], source=0)
© CC-BY-NC-SA 4.0 by CSC - IT Center for Science Ltd.