Skip main navigation

Overview of MPI shared memory

In this article, we provide an overview of the most important aspects of using MPI Shared Memory.
© HLRS

The following list shows the key issues for using MPI shared memory:

    1. Split main communicator into shared memory islands
        • MPI_Comm_split_type

       

       

 

    1. Define a shared memory window on each island
        • MPI_Win_allocate_shared

       

        • Result (by default):

          contiguous array, directly accessible by all processes of the island

       

       

 

    1. Accesses and synchronization
        • Normal assignments and expressions

       

        • No MPI_Put/Get !

       

        • Normal MPI one-sided synchronization, e.g., MPI_Win_fence

       

       

 

 

 

First, we need to split our original communicator with MPI_Comm_split_type into shared memory islands, because only the processes that have physical shared memory together can produce a shared memory window.

 

Second, we have to call MPI_Win_allocate_shared. Of course, only the processes that are together in a shared memory island can call this routine collectively. It behaves like MPI_Win_allocate from Steps 1.10 and 2.1, but by default, you get all your windows combined to one (long) contiguous array. There is another method where the array is not contiguous, and typically uses several physical memories within a cc-NUMA node.

 

Third, with this approach you can access the shared memory directly from all processes with normal language assignments (i.e., C and Fortran assignments) or expressions, terms etc. to load or store data. The use of MPI_Put or MPI_Get here is absolutely unnecessary. In fact, what your Fortran or C compiler produces is significantly faster than if you call a library routine like MPI_Put or MPI_Get.

 

However, avoiding race conditions is still your responsibility. For that you need synchronizations. One possibility is to use normal one-sided synchronizations, for example MPI_Win_fence.

 

 

Caution:

 

    • Memory may already be completely pinned to the physical memory of the process with rank 0, i.e. the first touch rule (as in OpenMP) does not apply!

 

    • First touch rule: a memory page is pinned to the physical memory of the processor that first writes a byte into the page.

 

 

 

At this point the first warning: with your Win_allocate_shared you will typically get a memory that may already be completely pinned in the hardware. This means that the affinity of a memory chunk to a given process will not be determined by the first access to it (i.e. the so-called first-touch rule, which is typically valid with OpenMP, does not apply here).

© HLRS
This article is from the free online

One-Sided Communication and the MPI Shared Memory Interface

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now