Skip main navigation

Overview of MPI shared memory

In this article, we provide an overview of the most important aspects of using MPI Shared Memory.
© HLRS

The following list shows the key issues for using MPI shared memory:

    1. Split main communicator into shared memory islands
        • MPI_Comm_split_type

       

       

 

    1. Define a shared memory window on each island
        • MPI_Win_allocate_shared

       

        • Result (by default):

          contiguous array, directly accessible by all processes of the island

       

       

 

    1. Accesses and synchronization
        • Normal assignments and expressions

       

        • No MPI_Put/Get !

       

        • Normal MPI one-sided synchronization, e.g., MPI_Win_fence

       

       

 

 

 

First, we need to split our original communicator with MPI_Comm_split_type into shared memory islands, because only the processes that have physical shared memory together can produce a shared memory window.

 

Second, we have to call MPI_Win_allocate_shared. Of course, only the processes that are together in a shared memory island can call this routine collectively. It behaves like MPI_Win_allocate from Steps 1.10 and 2.1, but by default, you get all your windows combined to one (long) contiguous array. There is another method where the array is not contiguous, and typically uses several physical memories within a cc-NUMA node.

 

Third, with this approach you can access the shared memory directly from all processes with normal language assignments (i.e., C and Fortran assignments) or expressions, terms etc. to load or store data. The use of MPI_Put or MPI_Get here is absolutely unnecessary. In fact, what your Fortran or C compiler produces is significantly faster than if you call a library routine like MPI_Put or MPI_Get.

 

However, avoiding race conditions is still your responsibility. For that you need synchronizations. One possibility is to use normal one-sided synchronizations, for example MPI_Win_fence.

 

 

Caution:

 

    • Memory may already be completely pinned to the physical memory of the process with rank 0, i.e. the first touch rule (as in OpenMP) does not apply!

 

    • First touch rule: a memory page is pinned to the physical memory of the processor that first writes a byte into the page.

 

 

 

At this point the first warning: with your Win_allocate_shared you will typically get a memory that may already be completely pinned in the hardware. This means that the affinity of a memory chunk to a given process will not be determined by the first access to it (i.e. the so-called first-touch rule, which is typically valid with OpenMP, does not apply here).

© HLRS
This article is from the free online

One-Sided Communication and the MPI Shared Memory Interface

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education