General Purpose Computation on Graphics Processing Unit

from Wikipedia, the free encyclopedia

General Purpose Computation on Graphics Processing Unit ( GPGPU for short , from English for general purpose computation on graphics processing unit ( s) ) refers to the use of a graphics processor for calculations beyond its original scope. This can be, for example, calculations for technical or economic simulations. With parallel algorithms , an enormous increase in speed can be achieved compared to the main processor .


GPGPU emerged from the shaders of the graphics processors. Its strength lies in the simultaneous execution of uniform tasks, such as coloring pixels or multiplying large matrices . Since the increase in speed of modern processors can currently no longer (primarily) be achieved by increasing the clock rate, parallelization is an important factor in achieving higher computing power in modern computers. The advantage of using the GPU over the CPU lies in the higher computing power and the higher memory bandwidth. The speed is mainly achieved through the high degree of parallelism of the arithmetic operations of the graphics processor.

model Theoretical computing power Memory bus
data rate
( GByte / s )
Storage type Art
with simple at double
Accuracy ( GFlops )
AMD Radeon Pro Duo 16,384 1,024 1,024 HBM GPU
AMD Radeon R9 Fury X 8,602 538 512
Nvidia Geforce GTX Titan X 6,144 192 336 GDDR5
AMD FirePro W9100 5,350 2,675 320
Nvidia Tesla K20X 3,950 1.310 250
AMD Radeon HD 7970 3,789 947 264
Intel Xeon Phi 7120 2,420 1,210 352 Co-processor
PlayStation 4 SoC ( AMD ) 1,860 - 167 APU
Nvidia Geforce GTX 580 1,581 198 192.4 GPU
Intel Xeon E7-8890 v3 1,440 720 102.4 (?) DDR4 CPU
AMD A10-7850k 856 - 34 DDR3 APU
Intel Core i7-3930K 307.2 153.6 51.2 CPU
Intel Pentium 4 with SSE3, 3.6 GHz 14.4 7.2 6.4 DDR2

Fragment and vertex shaders can run at the same time. Another advantage is the low price compared to other similarly fast solutions and the fact that suitable graphics cards can be found in almost every PC today.


In the beginning, shaders were only associated with special functions that were closely linked to graphical calculations. In order to accelerate the speed of the calculation of individual pixels, it was decided to carry out the calculation of individual pixels at the same time by using several calculators of the same type. Later, the idea of ​​expanding the very limited capabilities of the shaders in order to turn them into massively parallel processing units for any task came up: The first - more or less - freely programmable shaders emerged. The trend of designing shaders in a freely programmable manner continues to this day and is being pushed forward by chip designers with each new generation of technology. Modern GPUs sometimes have over 1000 of these programmable shader units and can therefore also carry out over 1000 computing operations at the same time.


By OpenCL a uniform interface exists to implement GPGPU calculations. The disadvantage compared to conventional CPUs is the massive parallelism with which the programs must be executed in order to take advantage of these advantages. GPUs are also limited in their functionality. There are special graphic models ( Nvidia Tesla , AMD FireStream ) for the scientific sector . The memory of these graphics cards has error correction procedures and their accuracy when calculating floating point numbers is greater, which is also reflected in the costs.


OpenCL , CUDA and, since 2012, C ++ AMP are mainly available for developing GPGPU-compatible programs . OpenCL is an open standard that is available on many platforms, whereas CUDA is a proprietary framework from Nvidia and can only run on GPUs from this manufacturer. AMP is one of Microsoft initiated C ++ -Spracherweiterung in conjunction with a small template - library that is open in the sense that they neither Microsoft products, nor to certain Accelerator hardware is limited types or certain hardware manufacturers ( thus not only GPGPUs, but also CPUs and, in the future, other parallelization options, such as cloud computing ). In Microsoft's AMP implementation, the GPU is expected to support DirectX Version 11, because it was only with this version that the use of GPUs as GPGPUs was particularly taken into account. If an AMP-using program does not find a sufficiently up-to-date GPU, the algorithm programmed with AMP is automatically executed on the CPU using its parallelization options ( multithreading on several processor cores , SIMD instructions). AMP should therefore create a complete abstraction layer between an algorithm and the hardware equipment of the executing computer. In addition, the restriction to a few new C ++ language constructs and a few new library classes is intended to reduce the previous hurdles and efforts in the development of parallel algorithms. DirectX 11 is already natively hardware-supported by all common GPU series (more recent than the DirectX 11 introduction) (including basic performance GPUs such as Intel's chipset- integrated GPUs), but DirectX 11 was only introduced with Windows 7 and Supplied for Windows Vista only , so that older Windows operating systems cannot be used with AMP. Whether C ++ AMP will ever be adapted by other platforms or C ++ development environments outside of the Windows world is currently still completely open.

A more recent approach is OpenACC , which, like OpenMP, is controlled via compiler pragmas. Ordinary source code, e.g. B. in C ++, automatically parallelized by placing certain compiler pragmas like "#pragma acc parallel" in front of the serially formulated For-Loops. The porting effort is relatively small. However, automatic parallelization does not always lead to optimal solutions. OpenACC can therefore never completely replace explicit parallel programming as in OpenCL. Nevertheless, in many cases it is worthwhile to be able to achieve high acceleration factors on GPGPU in this simple way. OpenACC is supported by commercial compilers like PGI and free compilers like the GNU Compiler Collection .

In order to run programs on a GPU, you need a host program that controls the flow of information. Usually, the GPGPU code formulated in a C -like language is compiled at runtime at the instruction of the host program and sent to the graphics processor for further processing, which then returns the calculated data to the host program.

See also


  • Matt Pharr: GPU Gems 2 . Addison-Wesley Publishing Company, 2005, ISBN 0-321-33559-7 , Part IV - General-Purpose Computation on GPUs: A Primer.
  • David B. Kirk: Programming Massively Parallel Processors: A Hands-on Approach [Paperback] . Morgan Kaufmann, 2010, ISBN 978-0-12-381472-2 .

Web links