OpenMP

from Wikipedia, the free encyclopedia
OpenMP

logo
Basic data

developer List of compatible compilers
Current  version Version 5.0
(November 2018)
Current preliminary version -
operating system Linux , Unix , Microsoft Windows NT
category API
License unknown (open)
German speaking No
openmp.org/

OpenMP (Open Multi-Processing) is a 1997 jointly by different hardware - and compiler manufacturers developed programming interface (API) for shared memory programming in C ++ , C and Fortran on multiprocessor - computers .

OpenMP parallelizes programs on the level of loops that are executed in different threads , and this differs from other approaches (e.g. MPI ), in which entire processes run in parallel and interact by exchanging messages.

The OpenMP standard defines special compiler directives for this purpose, which then instruct them e.g. B. to distribute the processing of a for loop over several threads and / or processors. However, there are also library functions and environment variables that are used for OpenMP programming.

OpenMP is intended for use on systems with shared main memory (shared memory machines) (so-called UMA and NUMA systems), while other approaches such as Message Passing Interface and PVM are more based on multicomputers (distributed memory machines). In modern supercomputers, OpenMP and MPI ( Message Passing Interface ) are often used together. OpenMP processes that are exchanged using MPI run on individual shared memory clients.

One property of OpenMP is that (with a few exceptions) the programs also run correctly if the compiler does not know the OpenMP instructions (see example below) and evaluates them as comments (i.e. ignored them). The reason for this is that a forloop split for several threads with OpenMP can also be processed sequentially with a single thread.

Main ingredients

The main components of OpenMP are constructs for thread generation, load distribution over several threads , administration of the scope of data, synchronization, runtime routines and environment variables. The thread generation: omp parallel divides the program (the original thread) into several threads so that the program part enclosed by the construct is processed in parallel. The original thread is called the master thread and has the ID "0".

Example: Outputs “Hello World!” Multiple times using multiple threads (each thread has an output).

#include <stdio.h>

int main() {
#pragma omp parallel
    puts("Hallo Welt!\n");

    return 0;
}

The load balancing constructs determine how concurrent, independent workload is distributed across parallel threads. Omp for and omp do divide loops (if possible) evenly across all threads (area division, data partitioning). Sections distributes successive but independent program parts to different threads (function division, function partitioning).

Example: Initializes a large array in parallel, whereby each thread takes over the initialization of a part of the array (division of areas).

#define N 100000

int main() {
    int a[N];

#pragma omp parallel for
    for (int i = 0; i < N; ++i)
        a[i] = 2 * i;

    return 0;
}

The administration of a validity area for data can be influenced with different programming. With shared memory programming, most of the data is initially visible in all threads. In some programs, however, there is a need for private data, i.e. data that is only visible to one thread, and the explicit exchange of values ​​between sequential and parallel sections. The so-called data clauses are used for this in OpenMP . The shared type describes that data are visible and can be changed by all threads. They are in the same memory location for all threads. Without further information, data are shared data. The only exception to this are loop variables . With private , each thread has its own copies of this data, which are not initialized. The values ​​are not preserved outside of the parallel section. The private type can be divided into first- private and last-private , which can also be combined. In the former, the data is private, with the difference that it is initialized with the last value before the parallel section. Lastprivate differs in that the thread that executes the last iteration then copies the value from the parallel section.

There is also the threadprivate type . These data are global data, but are treated as private in the parallel program section. The global value is preserved across the parallel section. Copyin is analogous to firstprivate for private data, but for thread- private data which are not initialized. With Copyin the global value is explicitly transferred to the private data. A copyout is not necessary as the global value is preserved. With the type reduction , the data is private, but is summarized (reduced) to a global value at the end. For example, the sum of all elements of an array can be determined in parallel.

Various constructs are used to synchronize the threads, such as B. Critical Section , where the included program section is executed by all threads, but never at the same time or Barrier which marks a barrier, where each thread waits until all other threads in the group have also reached the barrier. The atomic command is analogous to critical section , but with a note to the compiler to use special hardware functions. The compiler is not bound by this note; it can ignore it. It makes sense to use atomic for exclusive updating of data. Flush marks a synchronization point at which a consistent memory image must be created. Private data is written back to the main memory. Single means that the enclosed program part is only executed by the thread that reaches it first; this implies a barrier at the end of the block and is therefore equivalent to a barrier at a certain point. Master is analogous to single with the difference that the enclosed program part is executed by the master thread and there is no barrier implied at the end of the block.

In these processes, runtime routines are used, for example to determine the number of threads during runtime and to determine whether the program is currently in the parallel or sequential state.

In this context, environment variables provide information such as the thread ID. The execution of OpenMP programs can be changed by specifically changing certain environment variables. For example, the number of threads and the loop parallelization can be influenced at runtime.

Sample code

The following code illustrates the parallel execution of a for loop using OpenMP. Depending on the number of threads involved, the loop is divided into various small sections that are each assigned to a thread. This ensures that all threads compute at the same time.

#include <omp.h>
#include <stdio.h>

int main() {
    omp_set_num_threads(4);

#pragma omp parallel for
    for (int i = 0; i < 4; ++i) {
        const int id = omp_get_thread_num();

        printf("Hello World from thread %d\n", id);

        // Nur im Masterthread ausführen
        if (id == 0)
            printf("There are %d threads\n", omp_get_num_threads());
    }

    return 0;
}

When compiling you have to tell the compiler that it should follow the pragma instructions and include the necessary libraries for the omp functions. This works with gccor clangvia the option -fopenmp.

% gcc -fopenmp example.c -o example
% ./example
Hello World from thread 3
Hello World from thread 0
Hello World from thread 1
Hello World from thread 2
There are 4 threads

Instead of specifying the number of threads in the program, this can also be specified at runtime. To do this, set the environment variable OMP_NUM_THREADSto the desired value.

% OMP_NUM_THREADS=4 ./example
Hello World from thread 3
Hello World from thread 0
Hello World from thread 1
Hello World from thread 2
There are 4 threads

Implementation

OpenMP is built into most compilers.

  • Microsoft Visual C ++ 2005, 2008 and 2010 (Professional, Team System, Premium and Ultimate Edition),
  • Intel Parallel Studio for different processors (OpenMP 3.1 from version 13),
  • GCC from version 4.2 (OpenMP 4.0 from version 5.0),
  • Clang / LLVM (OpenMP 3.1 from version 3.6.1),
  • Oracle Solaris Studio Compiler and Tools for Solaris OS (UltraSPARC and x86 / x64) and Linux,
  • Fortran, C and C ++ compilers from the Portland Group (OpenMP 2.5),
  • Gfortran ,
  • IBM XL C / C ++ compiler,
  • Nanos compiler
  • Pelles C (OpenMP 3.1 from version 8)

Web links

Individual evidence

  1. https://gcc.gnu.org/gcc-5/changes.html