# Derived data type

In this subsection we will learn to communicate strided data i.e a chunk of data with holes between the portions.

So far we have learnt to send messages that were a continuous sequence of elements and mostly of the basic data types such as buf, count etc. In this section we will learn how to transfer any combination of data in memory in one message. We will learn to communicate strided data, i.e., a chunk of data with holes between the portions, and how to communicate various basic data types within one message.

So, if we have many different types of data types such as int, float etc. with gaps, how would we perform communication in one way with one command? To do this we would first of all need to start by describing the memory layout that we would like to transfer. Following this, the processor that compiled the derived type layout will then do the transfer for us in the loop in a correct way. This can even be achieved with all kinds of broadcasts.

Since we would not need to copy data into a continuous array, to be transferred as a single chunk of memory, there is no waste of memory bandwidth in such a way. Therefore derived types are usually structures of:

• vectors
• subarrays
• structs
• others

Or they could be simple types that are being combined into one data layout without the need of copying into one piece to be transferred efficiently or in one block of message. It is not uncommon to have messages of size over 60 or more kilobytes. In cases where we would like to transfer the results of some programs that could be larger files, actually this is the most efficient way to do it. Of course, there are other alternatives such as writing the results into a file and later opening and reading the file. Quite often the codes do not actually return results, but they just write their results into a file, and eventually we’ll need to combine the results into one representation. This is quite similar to how we do it in a profiler or tracer, creating a file for each processor. So, it is already quite easy to understand that if we are debugging a code with two thousand cores (which is not that big) we will easily end up with two thousand files to be read that need to be interpreted and that will definitely take some time. We will learn about it more in the following subsections of parallel I/O.

### Derived data types — type maps

A derived data type is logically a pointer to a list of entries. However, once this data type has been saved somewhere, it is not communicated over the network. When the need comes we just use this type simply as it would be a basic data type. However, the only prerequisite is that for each of these data types we need to compute the displacement. Quite obviously MPI does not communicate these displacements over the network.

Basic data type Displacement
MPI_CHAR 0
MPI_INT 4
MPI_INT 8
MPI_DOUBLE 16

Here you can see the description of the memory layout and the displacements. For example, MPI_INT can be displaced for four or eight bytes and MPI_DOUBLE is displaced for sixteen bytes and so on. A derived data type describes the memory layout of, e.g., structures, common blocks, subarrays and some variables in the memory etc.

### Contiguous Data

This is the simplest derived data type as it consists of a number of contiguous items of the same data type.

In C we use the following function to define it:

int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype)

### Committing and Freeing a data type

Before a data type handle is used in message passing communication, it needs to be committed with MPI_TYPE_COMMIT. This needs to be done only once by each MPI process. When we commit, it implies that data type recites a description inside, that can be used similar to a basic data type. However, if at some point this changes or we would like to release some memory, or if we will not use it anymore, we may call MPI_TYPE_FREE() to free the data type and its internal resources.

The routine used is as follows:

int MPI_Type_commit (MPI_Datatype *datatype);

### Example

Here in this example we can see the real need for derived data types.

struct buff_layout { int i_val[3]; double d_val[5];} buffer;

We have a structure of fixed size integer values and also some double values. So, this is one single data that we would like to describe in a data type so that we could then send this structure in one command, i.e., send or receive. We do not really care whether it is blocking or non-blocking at this point. So, in order to achieve this we describe the data type called buff_datatype. This is actually a name that we commit to this type.

array_of_types[0] = MPI_INT;array_of_blocklengths[0] = 3;array_of_displacements[0] = 0;array_of_types[1] = MPI_DOUBLE;array_of_blocklengths[1] = 5;array_of_displacements[1] = …;MPI_Type_create_struct (2, array_of_blocklengths, array_of_displacements, array_of_types, &buff_datatype);MPI_Type_commit(&buff_datatype);

So, we push the type after we create it and compile it to the MPI subsystem. Afterwards, the subsystem refers to that data type inside the system itself and knows how to convert, the integers etc. with the type that we use inside.

MPI_Send(&buffer, 1, buff_datatype, …)

Of course, there can also be some kind of gaps that we would not actually see if we are using some other languages such as Fortran and sometimes we even have memory alignments for it. So, there may be a gap of one integer at the start. But this is not an error on our part but it is just an adjustment, like some kind of performance adjustment, so that the next array starts at the location that is the multiple of four. So, while describing such an array MPI knows how to do it most efficiently.