Introduction to Parallel Programming Archives

GPU execution models and programming solutions

As already mentioned, GPUs serve as accelerators to CPUs, i.e., computationally intensive tasks are off-loaded from CPUs to GPUs. Standard programming languages such as Fortran and C/C++ do not permit …

Messages and communications

Until now, we have introduced the MPI and we have used some simple routines such as rank and size to distinguish between different processes and to actually assign them some …

Different types of communication in MPI

There are two criteria by which we can divide the types of communication in MPI. First way to define types of communication is to divide it according to the number …

Communicator in MPI

In the introduction to MPI in the first week we already saw the simple exercise of Hello world. Yet, in order to actually write some useful applications, we will need …

Derived data type

So far we have learnt to send messages that were a continuous sequence of elements and mostly of the basic data types such as buf, count etc. In this section …

Layout of struct data types

Vector data types What we learnt so far in the previous subsection and the exercise were more like continuous vectors. Sometimes we would need to communicate vectors with holes that …

One sided communication

As we have already learnt in the beginning the parallelisation in MPI is based on the distributed memory. This means that if we run a program on different cores, each …

Hybrid MPI

Hybrid MPI + OpenMP Masteronly Style We saw in the previous exercise that the scaling efficiency may be limited by the Amdahl’s law. This means that, of course, even though …

Non-Blocking communications

We saw in the previous week that the types of communication in MPI can be divided by two arguments, i.e., based on the number of processes involved: Point-to-Point Communication Collective …

Data environment

There are additional clauses that are available with the task directive: untied If the task is tied, it is guaranteed that the same thread will execute all the parts of …

MPI_Reduce

So far in the basic collective communication we have encountered broadcast, scatter and gather. Now, we can move on to more advanced collective communication where we will cover routines MPI_Reduce …

Scatter and Gather

Scatter As we saw in the broadcast function, the root process sends the same data to every other process. However, sometimes in many applications, we might have some data, that …

Combined parallel worksharing directives

Combined constructs are shortcuts for specifying one construct immediately nested inside another construct. Specifying a combined construct is semantically identical to specifying the first construct that encloses an instance of …

Nesting and binding

Directive Scoping OpenMP specifies a number of scoping rules on how directives may associate (bind) and nest within each other. That is why incorrect programs may result, if the OpenMP …

Visual profiling and tracing of the GPU codes

In this step, we will present some tools for visual profiling and tracing of the GPU codes. These tools are generally available with the GPU SDK, some of them can …

Course: Introduction to Parallel Programming

Introduction to Parallel Programming