Parallel programming

Parallel programming is a programming paradigm . On the one hand, it comprises methods of dividing a computer program into individual parts that can be executed concurrently , and on the other hand methods of synchronizing concurrent program sections . This is in contrast to classic, sequential (or serial) programming. One advantage of parallel programming is, in addition to the possibility of faster program execution (e.g. when using several processor cores ), the possibility of mapping the typical everyday phenomenon of concurrency directly in programs, which can lead to simpler, more understandable source text . A disadvantage is that the runtime behavior of parallel algorithms can be more difficult to understand than that of an equivalent sequential algorithm.

implementation

For parallelization, it is theoretically irrelevant whether the individual program parts are actually actually processed simultaneously by independent execution units, or whether they are only executed quasi-parallel (see time sharing , multitasking ).

In parallel programming, the less strict term of concurrency is used , in which the program code is not executed strictly one after the other but in parallel. Two program parts can be parallelized if the parallel, interlocked or twisted execution leads to the same result as the sequential execution ( parallelization ). If this is not the case, the execution can lead to a race condition .

The concurrency of several independent processes is called multitasking ; Concurrency within a process as multithreading . In the early days of computer development, pure time-sharing systems were also widespread, which enabled concurrency at the user level.

Supporting hardware

In most cases, the parallel execution of a program is supported by the hardware; the programming languages are then generally adapted accordingly. For example, parallel programming can be done explicitly by the programmer executing program parts in separate processes or threads , or it can be done automatically, so that causally independent ( parallelizable ) instruction sequences are executed “next to one another”. This automatic parallelization can be carried out by the compiler if a computer with a multi-core processor or a parallel computer is available as the target platform , but some modern CPUs can also recognize such independence (in the machine code or microcode of a program) and thus transfer the instructions to different parts of the Processors so that they are executed simultaneously ( out-of-order execution ).

Conflicts

As soon as the individual processes or threads communicate with one another, strictly speaking, they are no longer concurrently as a whole (they influence one another) - only individual sub-processes are concurrent with one another. Now, if the order of execution of the contact points (or communication points ) is not specified accordingly, possible implications conflicts, in particular a so-called deadlock ( deadlock ) when two processes are waiting on each other (or to block each other). Various techniques are used to solve this problem:

The context of each part of the program must be protected from unexpected changes by other parts ( synchronization ). If joint access to data is to be implemented, with at least one party wanting to write / change access, then the access must be synchronized, for example by mutual exclusion ( mutex ) using monitors or semaphores . Alternatively, certain actions can also be required to be carried out jointly by two processes , with so-called rendezvous . Another secure way of communication is through queues . These techniques solve the problem of concurrent access to resources , but do not prevent deadlocks (on the contrary).

Such techniques are particularly important in distributed systems , especially to ensure the integrity of distributed transactions .

Parallel programming languages

Most programming languages offer options for parallelizing processes. However, some languages are designed from the ground up for, or have this ability inherently in, parallel programming:

Newsqueak - language published in the 1980s for implementing graphical user interfaces
Occam - 1985 published imperative programming language based on Communicating Sequential Processes builds
Scratch - educational visual programming language released in 2007 . An astonishing property of Scratch is that the actually complex programming paradigm parallel programming is introduced almost on the side. In contrast to traditional education-oriented programming languages, this is also used intuitively by beginners in Scratch, so that they may later be surprised that “professional programming languages” initially only work sequentially.
X10 - parallel programming language developed by IBM for programming massively parallel systems.
Erlang
Chapel - competitor to X10; by Cray
Unified Parallel C - an extension of C99; from the UPC consortium .

Efficiency

The simultaneous processing of the calculations generally shortens the execution time of a program. With regard to the CPU time used , a parallel algorithm is almost always worse than a serial one, since the synchronization of the processes creates additional administrative overhead.

On a single-core system, only parallelization is therefore useful, in which one execution thread can “continue to work” while another has to / should wait - without parallelization, this forced waiting would block the “background processing”. In most cases, you have to wait for the operating system to complete a job from the program, or the execution thread should wait for further user input while processing (e.g. previously triggered user jobs) is being calculated in the background.

Multiple execution units are required to take full advantage of parallel programming. Techniques for this are simultaneous multithreading (SMT), e.g. B. Hyper-Threading Technology (HTT), Symmetrical Multiprocessing (SMP), e.g. B. by means of a multicore processor or several CPUs . The extreme case is massively parallel processing (MPP) with sometimes several thousand processors. It is also common to use entire networks of individual computers for parallel computing ( computer clusters ).

On the other hand, there should be as little control code as possible for coordinating the threads if it can only be executed sequentially. This fact is the result of Amdahl's law on the parallelization properties of programs.

literature

Peter Ziesche: Concurrent & distributed programming , W3L, 2004, ISBN 3-937137-04-1
Douglas Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann: Pattern-oriented software architecture, pattern for concurrent and networked objects , dpunkt 2002, ISBN 3-89864-142-2
M. Ben-Ari: Basics of parallel programming , 1984, ISBN 3-446-14155-3
Dietrich Boles: Learning to parallel programming with ease using the Java hamster model - programming with Java threads (PDF; 5.1 MB) . Vieweg + Teubner-Verlag, 2008, ISBN 978-3-8351-0229-3
Wolfgang W. Baumann: Parallel Computing as an Aid in Technical and Scientific High Performance Computing - An Introduction , 2000 pdf