from Wikipedia, the free encyclopedia

Basic data

developer Khronos Group
Publishing year August 28, 2009
Current  version 2.2-11
( July 19, 2019 )
Current preliminary version 3.0 provisional
(April 27, 2020)
operating system platform independent
programming language C , C ++
category Programming interface
License different

OpenCL ( English Open C omputing L anguage ) is an interface for non-uniform parallel computer , the z. B. are equipped with main , graphics or digital signal processors . This includes the programming language "OpenCL C". OpenCL was originally developed by Apple in order to make the performance of the current graphics processors usable for non-graphic applications.

In cooperation with the companies AMD, IBM, Intel and Nvidia, the first draft was worked out and finally Apple submitted it to the Khronos Group for standardization. The specification for OpenCL 1.0 was released on December 8, 2008. Specification 1.2 followed on November 16, 2011 with improvements that remain backwards compatible with 1.0. Two years later, on November 18, 2013, the OpenCL 2.0 specification was introduced.

OpenCL 1.0 was first brought to market by Apple on August 28, 2009 with the Mac OS X Snow Leopard 10.6 operating system ; the associated programs ("kernel") can be distributed to various existing OpenCL-capable devices at runtime . Currently, only OpenCL 1.0 to 1.2 is supported by Apple, depending on the hardware.

Hardware compatible with OpenCL 2.0 also supports the higher versions 2.1 and 2.2 with updated drivers, according to the Khronos Group.

OpenCL 2.1 was officially released in November 2015. A decisive innovation in OpenCL 2.1 is the integration of SPIR-V, the SPIR successor (Standard Portable Intermediate Representation). SPIR-V is an intermediate language with native support for graphics shaders and processor kernels. It enables the compiler chain to be divided among the various processing units. This means that high-level languages ​​can address the heterogeneous architecture via SPIR-V without having to worry about the translation to the various hardware components. In addition to OpenCL, SPIR-V is also used in the Vulkan Graphics API .

OpenCL 2.2 was officially released in May 2017. The announcement mentions the integration of the OpenCL C ++ Kernel Language in OpenCL as the most important change, which should help, among other things, when writing programs that work in parallel. The kernel language defined as a static subset of the C ++ 14 standard within the framework of the Open Computing Language contains classes, templates, lambda expressions and other constructs. In May 2018 a "maintenance update" was released with bug fixes and updates in the headers of Spir-V.

In the future, OpenCL is to merge with Vulkan as much as possible and thus gain even broader support. This was demonstrated with Apple's Premiere Rush, which uses the open source compiler clspv to compile a large part of the OpenCL C kernel code for a Vulkan environment on Android devices. Most recently, OpenCL Next was announced for 2019 with new information and emphasized that the OpenCL "roadmap" is independent of that of Vulkan, so the two projects will not completely merge. OpenCL 3.0 (provisional) was released on April 27, 2020.


Platform model

An OpenCL system consists of a host and one or more OpenCL devices. A device consists of one or more independent computing units (English "compute unit", "CU" for short). In the case of a multi-core processor, these are the available cores, which together make up the central processing unit , and the shaders for the graphics card . The Compute Unit is subdivided into one or more processing elements ( “PE” for short). The host distributes the kernels (programs, German cores) to the available devices at runtime.

There are two types of kernels:

OpenCL kernel
These are written in the OpenCL C programming language. OpenCL C is based on ISO C99 and has been expanded to include functions and data types for parallel processing.
Native kernel
These kernels are optional and implementation specific.

The OpenCL kernels are compiled by the OpenCL compiler at runtime and then executed by an OpenCL device. This means that at development time it is not necessary to know on which hardware the program will be executed later.

The calculations are carried out by the so-called work items during runtime. These work items are arranged in a one-, two- or three-dimensional grid, via which the work items can be addressed. Work items are grouped into work groups in which synchronization is possible and which can access a shared memory. An individual work item can therefore be addressed absolutely, through its coordinate and relatively, through the coordinate of the work group containing it and the coordinate within the work group.

Memory model

Memory model

There are five types of storage in OpenCL:

  • Host memory : The host memory is the regular main memory of the executive program. An OpenCL kernel cannot access it directly.
  • global memory ( global memory ): This is the memory of the OpenCL kernel. Each instance of a kernel has random access to the entire area.
  • constant memory ( constant memory ): The constant memory is different from the global memory that the kernel instances this memory only read, but can not change.
  • local storage ( local memory ): A group of kernel instances has random access to a small area of typically 16 kiB local memory. Each group has its own area that only the members can access.
  • Private memory ( private memory ): This memory is a kernel Instance reserved. Other kernel instances and the executive program cannot access the contents of this memory.

OpenCL C

The OpenCL C language is based on the syntax of ISO C99 , expanded to include additional data types and functions for parallel processing, but has also been restricted elsewhere (see below). It is therefore not a superset of C, but both languages ​​have a lot in common.

In addition to the C99 data types, OpenCL C supports the following data types:

  • half: 16 bit floating point numbers according to IEEE 754r .
  • Vector data types: The data types char , uchar , short , ushort , int , uint , long , ulong and float are available as vectors with 2, 4, 8 and 16 elements. The number of elements is appended to the name of the data type, e.g. E.g . : uchar4 , float8 or int16 . With OpenCL 1.1, three-element vectors were also introduced.
  • image2d_t: A two-dimensional image.
  • image3d_t: A three-dimensional image.
  • sampler_t: A sampler that defines how an image is sampled.
  • event_t: an event handler.

The following data types were also reserved for later versions of OpenCL:

  • bool n : A vector with truth values.
  • double, double n : 64-bit floating point numbers and vectors. An extension for double already exists, but its support is not required for OpenCL 1.0.
  • half n : A vector with 16-bit floating point numbers.
  • quad, quad n : 128-bit floating point numbers.
  • complex {half | float | double | quad}: Complex numbers with varying degrees of precision.
  • complex {half | float | double | quad} n : vectors of complex numbers with different precision.
  • imaginary {half | float | double | quad}: imaginary numbers with varying degrees of precision.
  • imaginary {half | float | double | quad} n : vectors of imaginary numbers with different precision.
  • {float | double} n x m : n x m matrices with 32 or 64 bit precision.
  • long double, long double n : Floating point numbers and vectors with at least the precision of double and at most the precision of quad .
  • long long, long long n : 128-bit signed integers and vectors.
  • unsigned long long, unsigned long long n : 128-bit unsigned integers and vectors.

Arithmetic operations ( +, -, *, /, %, ++, --), comparison operations ( >, >=, ==, !=, <= <), bit &, |, ^, ~operators ( ) and logical operators ( &&, ||) are defined both for scalar data types and for vectors. If they are applied to vectors, the operation is carried out component by component. Here OpenCL behaves in the same way as known shader languages ​​such as GLSL .

Also borrowed from the shader languages ​​are a number of mathematical functions that are also carried out component by component, for example sine, cosine, root, minimum, maximum, etc.

Compared to C, OpenCL C is restricted in the following points, among others:

  • There are no function pointers
  • Recursion is not possible.
  • Fields ("arrays") must not have a variable length.

Use of OpenGL and DirectX objects

OpenCL can directly access objects from OpenGL or DirectX (only under Windows), such as textures. This means that OpenCL can be used, for example, to change textures without having to copy the data.


Like OpenGL, OpenCL can also be supplemented with additional functions through manufacturer-specific extensions. Examples of extensions that have already been defined are:

  • Double precision floating point numbers (64-bit floating point numbers, cl_khr_fp64 ).
  • Half-precision floating point vectors (16-bit floating point numbers, cl_khr_fp16 ).
  • Define the type of rounding for floating point operations ( cl_khr_select_fprounding_mode ).
  • Writing in 3D images ( cl_khr_3d_image_writes ).


OpenCL can be implemented for any operating system and hardware platform - just like OpenGL and OpenAL . This is what the specification of CPUs, GPUs, DSPs and the cell processor is all about. There is also a specification for embedded systems with reduced requirements.

  • AMD: The OpenCL implementation from AMD enables the use of GPUs via their GPGPU interface ATI-Stream and CPUs with SSE3 for Linux and Windows. Current graphics cards of the generations GCN 4 (RX 400 / RX 500) and GCN 5 (Vega) fully support OpenCL 2.0. This also applies to the graphics cards of the older generations GCN 2 and GCN 3 . The outdated GCN 1 and Terascale 2 architectures support OpenCL 1.2. OpenCL 1.1 was supported from the R700 series . Under Windows, OpenCL 2.1 is also supported from driver 19.1.1. It is unclear here which hardware can benefit from this. Vega chips can currently only operate up to OpenCL 2.0.
  • ARM: Further implementations for GPUs come from ARM , Intel, S3 and VIA .
  • Intel OpenCL SDK: Intel also has a commercial SDK (2.0 for GPU and 2.1 for CPU from Gen7 ).
  • LLVM: The OpenCL implementations from Nvidia, Intel and Apple are technically based on LLVM , a technology that Apple uses in macOS from version 10.5 in its JIT - OpenGL compiler and also in iOS.
  • Nvidia: Nvidia offers an OpenCL implementation for its GPGPU interface CUDA under Linux, Windows and macOS. The generation of Tesla chips G80 and GT200 ( Nvidia Tesla ) supports OpenCL 1.1 with the latest drivers from version 341.95. Fermi supports OpenCL 1.1. OpenCL 1.2 (with the chips of the Kepler and Maxwell GPU) is supported at most. With the previous 370 series drivers, the new Pascal GPU only supports OpenCL 1.2, both in the GeForce 10 series and in the Quadro and Tesla series. With the driver version 378.66, OpenCL 2.0 is supported for the first time as beta in important functions.

Open Source:

  • Intel Beignet: Intel has set up the open source project “Beignet” for Linux and Android. Most recently, in November 2016, after the long-term support of OpenCL 1.2 (from Ivy Bridge), support for OpenCL 2.0 was announced. The current version is 1.3.2 with optional OpenCL 2.0 with need for optimization.
  • Intel Compute Runtime (NEO): Under the code name NEO, a new open source driver has been developed for hardware from Skylake with OpenCL 2.1 support (now also from Broadwell). OpenCL 2.2 should follow soon in "Compute Runtime". OpenCL 3.0 is currently in development for Tigerlake.
  • ROCm: As part of the OpenCompute initiative, AMD started the ROCm project in collaboration with the open source community. ROCm runs from generation GCN 3 graphics cards and CPUs with PCI Express 3.0 (AMD Ryzen or Intel Haswell). The functionality of OpenCL 2.0 is supported. Since version 1.8, processors with PCIe 2.0 with AMD GCN 3 graphics cards with lower computing power have also been experimentally supported. ROCm version 2.4 has some performance improvements and support for TensorFlow 2.0. Version 2.5 supports the Thrust library with rocThrust. Version 3.5 supports OpenCL 2.2. The current version is 3.5.1.
  • POCL: Portable OpenCL (OpenCL 1.2, OpenCL 2.0 mostly), version 1.0 with experimental Nvidia Cuda backend for the use of Nvidia GPUs. This means that Open Source OpenCL is possible on Nvidia hardware with considerably more options than with Mesa. Due to the lack of optimization and software, the performance is sometimes weak with a factor of 1: 5 to 1:10 compared to the AMD Windows implementation: with POCL 1.1 it is in part greatly improved and SPIR is experimentally supported with SPIR-V. With version 1.2, HWOC 2.0 and now OpenCL 1.2 are fully supported. With version 1.3 macOS is supported. Version 1.4 supports more SPIR and SPIR-V. The current version is 1.5
  • GalliumCompute (Clover): With Clover, an implementation for Linux under the GPL has been in development since mid-2011 , which is also based on LLVM and is to use a CPU or - indirectly via Mesa 3D  - a graphics card. Clover was integrated into the Mesa project and is part of GalliumCompute. Many tests for OpenCL 1.0 to 1.2 are not yet passed. With the change from TGSI to NIR, OpenCL in Mesa is being further developed again in order to be usable with open source drivers for graphics cards from AMD (with RadeonSI) and nVidia (with Nouveau).
  • Shamrock: Shamrock is an offshoot of Clover for Android ARM V7 +. OpenCL 1.2 is fully supported. The Khronos test was last passed with the "OpenCL 2.0 Samples Suite" for the examples up to 1.2.
  • triSYCL: Free OpenCL 2.2 implementation with SYCL.

A list of certified products is available from Khronos.

Like the implementations, these must pass the tests of the Khronos Conformance Test Suite (CTS).

Application software

Many compute-intensive programs use OpenCL to accelerate:

Graphics programs

  • FAST: Image processing for medical applications
  • KLMeansCL: De-noiser plug-in for AVISynth
  • UFO: Image processing of synchrotron tracks

3D renderer


  • CUETools: With the CUERipper from the CUETools, the FLAC files can be converted from WAV format to FLAC particularly quickly with the FLACCL function with the use of OpenCL in modern graphics cards. Accelerations of a factor of 10 to 100 are possible for this part of the rip process, depending on fast graphics cards and SSD data storage devices compared to normal CPUs and hard drives.
  • CN24: semantic analysis tool


  • HandBrake
  • FFmpeg
  • Final Cut Pro X
  • RealFlow Hybrido2
  • Sony Catalyst family
  • MAGIX Vegas (previously Sony Vegas)
  • AlchemistXF
  • vReveal from MotionDSP
  • Total Media Theater (TMT) from ArcSoft
  • C4D


  • Advanced Simulation Library
  • SecondSpace OpenCL program for simulating waves in 2D space.
  • PATRIC Particle-in-Cell Code
  • Bullet: GPU rigid body simulation using OpenCL
  • Monte Carlo simulation on AM57x
  • Intel Demo Real-Time Shallow Water Simulation
  • Intel code samples
  • GROMACS molecular simulations from version 5.1
  • FEM: SIEMENS NX Nastran 9.1+ and Simulia Abaqus 6.11+
  • Neural networks: clgen: Deep Learning Program Generator
  • Neural networks: nengo_ocl Brain simulations with Nengo
  • Decryption: JohnTheRipper



  • ACL: AMD Compute Libraries
    • clBLAS: complete set of BLAS level 1, 2 & 3 routines further in BLAS
    • clSparse: Routines for sparse matrices
    • clFFT: fast Fourier transform
    • clRNG: random number generators MRG31k3p, MRG32k3a, LFSR113 and Philox-4 × 32-10
  • AMGCL: AMG algebraic multi-grid solver
  • ArrayFire: is for parallel computing with an easy-to-use API with JIT compiler (open source)
  • Bolt: STL compatible library for creating accelerated data parallel applications
  • Boost.Compute: GPU / Parallel C ++ Library for OpenCL
  • Chlorine: C ++ 11 library for easy use of OpenCL 1.2+
  • CLBlast: tuned clBlast
  • clMAGMA: OpenCL port of the MAGMA project, a linear algebra library similar to LAPACK but for Multicore + GPU Systems
  • DeepCL: Neural Training Library
  • GEGL-OpenCL: Gimp GEGL with OpenCL
  • GpyFFT: Python Wrapper for FFT with clFFT
  • MOT: Maastricht Optimization Toolbox
  • Neanderthal: BLAS and LAPACK implementation for Clojure
  • Netlib BLAS more in BLAS
  • OpenCLGA: genetic algorithms with PYOpenCL
  • random123: Collection of counter-based random number generators (CBRNGs)
  • VexCL: vector expression template library (MIT license)
  • ViennaCL: free open source linear algebra library of the Vienna University of Technology
  • HIP: Open Source C ++ Toolkit for OpenCL and Cuda
  • Project Coriander: Conversion of CUDA to OpenCL 1.2 with CUDA-on-CL
  • TF-Coriander project: Tensorflow with OpenCL 1.2

Language coupling

  • ClojureCL: parallel OpenCL 2.0 with Clojure
  • dcompute: run D natively
  • Obtain OpenCL binding
  • OpenCLAda: Binding Ada to OpenCL
  • OpenCL.jl: Julia Bindings
  • PyOpenCL: Python coupling
  • JavaScript: WebCL

Web links


  • OpenCL. University of Erlangen, archived from the original on March 4, 2016 ; accessed on January 6, 2019 .

Individual evidence

  1. Khronos OpenCL API Registry Specification and Header Files
  2. "Heise: OpenCL on the Mac" Heise: OpenCL on the Mac in 2014
  3. "Apple Support: OpenCL on Mac computers" only OpenCL 1.0 to 1.2 on macOS computers, no OpenCL 2.x (as of April 2016)
  4. "Khronos OpenCL Overview 2.1" PDF with an overview of OpenCL 2.1 in English
  12. Vulkan Update SIGGRAPH 2019. , P. 24.
  15. AMD on ATI Stream and OpenCL ( Memento from August 9, 2009 in the Internet Archive ) (October 1, 2009)
  18. ^ ARM Introduces New Graphics Processor with OpenCL Support . ( Memento of November 14, 2010 in the Internet Archive ) October 10, 2010
  19. Intel is jumping on the OpenCL train with "Sandy Bridge" . August 13, 2010
  20. S3 Graphics launched the Chrome 5400E embedded graphics processor . October 27, 2009
  21. VIA Brings Enhanced VN1000 Graphics Processor . ( Memento of December 15, 2009 in the Internet Archive ) December 10, 2009
  22. OpenCL Development Kit for Linux on Power (October 30, 2009)
  27. Chris Lattner: A cool use of LLVM at Apple: the OpenGL stack . ( Memento of November 4, 2006 in the Internet Archive ) August 15, 2006 (LLVM chief developer, Apple employee)
  28. Nvidia to OpenCL (September 28, 2009)
  35. Michael Larabel: Intel Open-Sources LLVM Graphics Compiler, Compute Runtime With OpenCL 2.1+. phrononix, February 16, 2018, accessed April 22, 2018 .
  36. Michael Larabel: Radeon ROCm 2.4 Released With TensorFlow 2.0 Compatibility, Infinity Fabric Support. In: Phoronix. May 8, 2019, accessed May 8, 2019 .
  51. ^ Clover Project Website , Clover Developer's Blog
  65. ^ ImageMagick: Architecture. Retrieved August 7, 2015 .
  70. Archived copy ( Memento from May 2, 2016 in the Internet Archive )
  83. Comparison of conversion with CPU Core i7 and different GPUs with FLACCL
  86. Heise: FFmpeg with OpenCL
  110. Archived copy ( Memento from October 14, 2016 in the Internet Archive )
  114. /
  115. /
  116. Archived copy ( Memento from November 16, 2016 in the Internet Archive )
  123. Archived copy ( Memento from March 16, 2015 in the Internet Archive )
  128. Archived copy ( Memento from April 29, 2016 in the Internet Archive )