CUDA

from Wikipedia, the free encyclopedia
CUDA

CUDA.png
Basic data

developer Nvidia
Publishing year June 23, 2007
Current  version 10.1.243
( August 19, 2019 )
operating system Windows , Linux , Mac OS X
category GPGPU
License proprietary
developer.nvidia.com

CUDA (formerly also called Compute Unified Device Architecture ) is a programming technology developed by Nvidia , with which program parts can be processed by the graphics processor (GPU). Additional computing capacity is provided in the form of the GPU, the GPU generally working significantly faster than the CPU in the case of program sequences that can be run in parallel (high data parallelism) . CUDA is mainly used in scientific and technical calculations.

Technical details

The graphics processor, which is only used for graphic calculations, is also used as a coprocessor via the CUDA API . Application examples are the solution of seismological or geological problems or the simulation of electromagnetic fields. CUDA is used in the SETI @ home project as part of the Berkeley Open Infrastructure for Network Computing (BOINC). In general, it can only be used efficiently where (among other conditions) calculations can be strongly parallelized.

The CUDA technology can be used with a graphics card from the "GeForce 8" series and on the Quadro cards from the Quadro FX 5600. The Tesla cards from Nvidia have been optimized for use in high-performance computing and are mainly addressed with CUDA, support but also open standards such as OpenCL. Some even lack the connections for monitors .

Since purchasing the PhysX technology from Ageia , Nvidia has been developing this technology further and rewriting it to CUDA. PhysX is used in numerous new games.

In March 2015, Nvidia released CUDA version 7.0.

In September 2015 Nvidia released the CUDA version 7.5.

CUDA version 8.0 has been available since September 2016, which fully supports the new Pascal series.

CUDA version 9.0 with update to 9.1 from December and update 9.2 from March, which fully supports the new Volta series, has been available since September 2017. FERMI is no longer supported.

CUDA 10 has been supporting the new Turing architecture since autumn 2018.

Program

Programmers are currently using C for CUDA (C with Nvidia extensions). There are also wrapper for the programming languages Perl , Python , Ruby , Java , Fortran and .NET or links to MATLAB , Mathematica and R . Nvidia created CUDA with the optimizing C compiler Open64 . Since the Fermi architecture, C ++ can also be used.

As CUVID (CUDA Video Decoding API) is defined as a programming interface for decoding video.

Alternatives

Examples of other GPGPU solutions:

  • OpenCL is an open standard initiated by the Khronos Group that works for all graphics cards and is available for most operating systems.
  • The open Vulkan standard, also developed by the Khronos Group, also supports OpenCL-like compute shaders .
  • DirectCompute an interface for GPGPUs integrated into the DirectX API.

software

One of the first programs to support CUDA is the folding @ home client , which multiplies the speed of biochemical calculations. The SETI @ home client followed on December 17, 2008, accelerating the search for extraterrestrial life by a factor of 10. Nvidia brought out the software "Badaboom", a video converter that can convert videos up to 20 times faster than a calculation with the CPU . Other programs that use CUDA are “TMPGEnc”, Sorenson Squeeze 7, Adobe Photoshop from CS4 (this accelerates the use of filters), Adobe Premiere Pro from CS5.5 and Mathematica 8+.

Simulation software such as MSC / Nastran 2013+ is sometimes very much accelerated with CUDA; A too small GPU memory can be a hindrance in large models. Other leading CFD and FEM software such as OpenFoam and ANSYS use CUDA to accelerate calculations. The power consumption of the calculations sometimes decreases due to the higher efficiency of the GPU compared to the CPU in these special arithmetic operations.

Criticism, disadvantages

Graphics processors ( GPUs ) are processors with an application-specific design, which is why GPUs tend to recognize exotic data types such as 9-bit or 12-bit with fixed decimal places, but often do without the register widths of 32, 48, 64 or 80-bit common for general-purpose CPUs and FPUs ( etc.). Thus, calculations, for example with the accuracies according to IEEE 754 (64 bit for double precision ), are often not provided in the instruction set of the GPU and have to be emulated by software in a relatively complex manner. Therefore, GPUs are particularly suitable for calculating data types that work with comparatively small bit widths.

As of the current status (2010), the first manufacturers are already producing extended GPUs that, in addition to the data types required by the GPU, also include universal data types and operations, e.g. B. for the direct calculation of IEEE 754 compliant results. As one of the currently leading manufacturers, Nvidia provides GPUs with the Fermi generation that provide both 32-bit integer and single- and double-precision floating point data formats natively (float / double).

Another disadvantage is the connection to the computer architecture. With current GPUs, this is usually done via PCIe and, compared to the direct connection of processors, results in worse (higher) latency times and lower I / O throughput rates. Outsourcing is therefore only worthwhile for functions that require a certain amount of computation - especially if a GPU is more suitable for these tasks in terms of the instruction set (e.g. for large matrices ).

The firm commitment to one manufacturer is also criticized. If you use CUDA, in contrast to libraries for CPUs with MMX or SSE extensions (which run practically on all CPUs from the various manufacturers of x86 processors ), you link a program to the GPU manufacturer Nvidia and thus to the presence of Nvidia hardware. OpenCL , now based on Vulkan / SPIR-V , is more universal and offers an implementation for GPUs from Nvidia, AMD (formerly ATI), VIA , S3 and others. For this purpose, CPU support for x86 processors is implemented via the SSE3 extensions, and IBM also offers an OpenCL implementation for the Power architecture and the Cell Broadband Engine . However, the broader approach of OpenCL results in a noticeable performance disadvantage when comparing CUDA with OpenCL (on identical Nvidia hardware). When using OpenCL, losses between 5 and 50% can be observed, depending on the problem.

Supported GPUs

Levels of the supported CUDA versions of GPU and card.

CUDA
Compute
Capability
(Version)
CUDA
Toolkit
Support
Micro-
architecture
GPUs Geforce / Tegra / Jetson Quadro Tesla
1.0 1.0-6.5 Tesla G80 GeForce 8800 Ultra, GeForce 8800 GTX, GeForce 8800 GTS (G80), Quadro FX 5600, Quadro FX 4600, Quadro Plex 2100 S4, Tesla C870, Tesla D870, Tesla S870
1.1 1.1-6.5 G92, G94, G96, G98, G84, G86 GeForce GTS 250, GeForce 9800 GX2, GeForce 9800 GTX, GeForce 9800 GT, GeForce 8800 GTS (G92), GeForce 8800 GT, GeForce 9600 GT, GeForce 9500 GT, GeForce 9400 GT, GeForce 8600 GTS, GeForce 8600 GT, GeForce 8500 GT , GeForce G110M, GeForce 9300M GS, GeForce 9200M GS, GeForce 9100M G, GeForce 8400M GT, GeForce G105M Quadro FX 4700 X2, Quadro FX 3700, Quadro FX 1800, Quadro FX 1700, Quadro FX 580, Quadro FX 570, Quadro FX 470, Quadro FX 380, Quadro FX 370, Quadro FX 370 Low Profile, Quadro NVS 450, Quadro NVS 420 , Quadro NVS 290, Quadro NVS 295, Quadro Plex 2100 D4, Quadro FX 3800M, Quadro FX 3700M, Quadro FX 3600M, Quadro FX 2800M, Quadro FX 2700M, Quadro FX 1700M, Quadro FX 1600M, Quadro FX 770M, Quadro FX 570M, Quadro FX 370M, Quadro FX 360M, Quadro NVS 320M, Quadro NVS 160M, Quadro NVS 150M, Quadro NVS 140M, Quadro NVS 135M, Quadro NVS 130M, Quadro NVS 450, Quadro NVS 420, Quadro NVS 295 no
1.2 2.3-6.5 GT218, GT216, GT215 GeForce GT 340 *, GeForce GT 330 *, GeForce GT 320 *, GeForce 315 *, GeForce 310 *, GeForce GT 240, GeForce GT 220, GeForce 210, GeForce GTS 360M, GeForce GTS 350M, GeForce GT 335M, GeForce GT 330M, GeForce GT 325M, GeForce GT 240M, GeForce G210M, GeForce 310M, GeForce 305M Quadro FX 380 Low Profile, NVIDIA NVS 300, Quadro FX 1800M, Quadro FX 880M, Quadro FX 380M, NVIDIA NVS 300, NVS 5100M, NVS 3100M, NVS 2100M, ION -
1.3 3.0-6.5 GT200, GT200b GeForce GTX 295, GTX 285, GTX 280, GeForce GTX 275, GeForce GTX 260, Quadro FX 5800, Quadro FX 4800, Quadro FX 4800 for Mac, Quadro FX 3800, Quadro CX, Quadro Plex 2200 D2, Tesla C1060, Tesla S1070, Tesla M1060
2.0 3.0-8.0 Fermi GF100, GF110 GeForce GTX 590, GeForce GTX 580, GeForce GTX 570, GeForce GTX 480, GeForce GTX 470, GeForce GTX 465, GeForce GTX 480M, Quadro 6000, Quadro 5000, Quadro 4000, Quadro 4000 for Mac, Quadro Plex 7000, Quadro 5010M, Quadro 5000M, Tesla C2075, Tesla C2050 / C2070, Tesla M2050 / M2070 / M2075 / M2090
2.1 3.2-8.0 GF104, GF106 GF108, GF114, GF116, GF117, GF119 GeForce GTX 560 Ti, GeForce GTX 550 Ti, GeForce GTX 460, GeForce GTS 450, GeForce GTS 450 *, GeForce GT 640 (GDDR3), GeForce GT 630, GeForce GT 620, GeForce GT 610, GeForce GT 520, GeForce GT 440, GeForce GT 440 *, GeForce GT 430, GeForce GT 430 *, GeForce GT 420 *, GeForce GTX 675M, GeForce GTX 670M, GeForce GT 635M, GeForce GT 630M, GeForce GT 625M, GeForce GT 720M, GeForce GT 620M, GeForce 710M, GeForce 610M, GeForce GTX 580M, GeForce GTX 570M, GeForce GTX 560M, GeForce GT 555M, GeForce GT 550M, GeForce GT 540M, GeForce GT 525M, GeForce GT 520MX, GeForce GT 520M, GeForce GTX 485M, GeForce GTX 470M, GeForce GTX 470M , GeForce GT 445M, GeForce GT 435M, GeForce GT 420M, GeForce GT 415M, GeForce 710M, GeForce 410M, Quadro 2000, Quadro 2000D, Quadro 600, Quadro 410, Quadro 4000M, Quadro 3000M, Quadro 2000M, Quadro 1000M, NVS 5400M, NVS 5200M, NVS 4200M no
3.0 4.2-10.2 Kepler GK104, GK106, GK107 GeForce GTX 770, GeForce GTX 760, GeForce GT 740, GeForce GTX 690, GeForce GTX 680, GeForce GTX 670, GeForce GTX 660 Ti, GeForce GTX 660, GeForce GTX 650 Ti BOOST, GeForce GTX 650 Ti, GeForce GTX 650, GeForce GTX 880M, GeForce GTX 780M, GeForce GTX 770M, GeForce GTX 765M, GeForce GTX 760M, GeForce GTX 680MX, GeForce GTX 680M, GeForce GTX 675MX, GeForce GTX 670MX, GeForce GTX 660M, GeForce GT 650M, GeForce GT 650M, GeForce GT 65045 GeForce GT 645M, GeForce GT 740M, GeForce GT 730M, GeForce GT 640M, GeForce GT 640M LE, GeForce GT 735M, GeForce GT 730M, Quadro K5000, Quadro K4200, Quadro K4000, Quadro K2000, Quadro K2000D, Quadro K600, Quadro K420, Quadro K500M, Quadro K510M, Quadro K610M, Quadro K1000M, Quadro K2000M, Quadro K1100M, Quadro K2100M, Quadro K314000M, Quadro K314000M , Quadro K5000M, Quadro K4100M, Quadro K5100M, Tesla K10, GRID K340, GRID K520
3.2 Tegra TK GK20A Jetson TK1 (Tegra K1) no no
3.5 5.0-10.2 GK110, GK208 GeForce GTX TITAN Z, GeForce GTX TITAN Black, GeForce GTX TITAN, GeForce GTX 780 Ti, GeForce GTX 780, GeForce GT 640 (GDDR5), GeForce GT 630 v2, GeForce GT 730, GeForce GT 720, GeForce GT 710, GeForce GT 740M (64-bit, DDR3) Quadro K6000, Quadro K5200, Tesla K40, Tesla K20x, Tesla K20,
3.7 5.5-10.2 GK210 no no Tesla K80
5.0 6.0-11.0 Maxwell GM107, GM108 GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 960M, GeForce GTX 950M, GeForce 940M, GeForce 930M, GeForce GTX 860M, GeForce GTX 850M, GeForce 845M, GeForce 840M, GeForce 830M, Quadro K2200, Quadro K1200, Quadro K620, Quadro M2000M, Quadro M1000M, Quadro M600M, Quadro K620M no
5.2 6.5-11.0 GM200, GM204, GM206 GeForce GTX TITAN X, GeForce GTX 980 Ti, GeForce GTX 980, GeForce GTX 970, GeForce GTX 960, GeForce GTX 950, GeForce GTX 750 SE, GeForce GTX 980M, GeForce GTX 970M, GeForce GTX 965M, Quadro M6000 24GB, Quadro M6000, Quadro M5000, Quadro M4000, Quadro M2000, Quadro M5500, Quadro M5000M, Quadro M4000M, Quadro M3000M, Tesla M4, Tesla M40, Tesla M6, Tesla M60
5.3 Tegra TK GM20B Jetson TX1 (Tegra X1) no no
6.0 8.0-11.0 Pascal GP100 - - Tesla P100
6.1 8.0-10.2 GP102 Titan X, GeForce GTX 1080 Ti Quadro P6000 Tesla P40
GP104 GeForce GTX 1070, GeForce GTX 1080 Quadro P5000 Tesla P4
GP106 GeForce GTX 1060
GP107 GeForce GTX 1050, GeForce GTX 1050 Ti
GP108
7.0 9.0-11.0 Volta GV100 NVIDIA TITAN V Quadro GV100 Tesla V100
7.2 9.0-11.0 GV10B NVIDIA Jetson AGX Xavier
7.5 10.0-11.0 Turing TU102, TU104, TU106 NVIDIA TITAN RTX,

GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060,

GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650

Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Quadro T2000, Quadro T1000 Tesla T4
8.0 11.0 amp GA100 A100

The Tesla micro-architecture (Compute Capability 1.x) is supported for the last time with the CUDA SDK version 6.5.
The Fermi micro-architecture (Compute Capability 2.x) is supported for the last time with the CUDA SDK Version 8.0.
The Kepler micro-architecture (Compute Capability 3.x) is supported for the last time with the CUDA SDK version 10.2.

literature

Web links

Individual evidence

  1. NVIDIA CUDA TOOLKIT 10.1.243 (English) . (PDF) NVIDIA CUDA TOOLKIT 10.1.243 . (accessed on September 5, 2019).
  2. a b c d CUDA Toolkit Archive. Retrieved August 2, 2018 .
  3. heise online: CUDA 10 supports Nvidia's Turing GPUs. Retrieved April 2, 2019 .
  4. ^ Mike Murphy: Nvidia's Experience with Open64. ( MS Word ; 83 kB) Retrieved on August 6, 2009 .
  5. LAV CUVID ( Memento of the original from October 6, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / 1f0.de
  6. http://pages.mscsoftware.com/rs/mscsoftware/images/Paper_GPU%20Computing%20with%20MSC%20Nastran%202013.pdf
  7. http://on-demand.gputechconf.com/gtc/2014/jp/sessions/9002.pdf
  8. Fermi Compute Architecture Whitepaper , Nvidia about Fermi (PDF; 876 kB) accessed on September 21, 2010
  9. Nvidia to OpenCL (September 28, 2009)
  10. AMD on ATI Stream and OpenCL ( Memento of the original from August 9, 2009 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. (October 1, 2009)  @1@ 2Template: Webachiv / IABot / developer.amd.com
  11. VIA Brings Enhanced VN1000 Graphics Processor ( Memento of the original from December 15, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (10 December 2009)  @1@ 2Template: Webachiv / IABot / www.via.com.tw
  12. S3 Graphics launched the Chrome 5400E embedded graphics processor (October 27, 2009)
  13. OpenCL Development Kit for Linux on Power (October 30, 2009)
  14. A Performance Comparison of CUDA and OpenCL (August 12, 2010; PDF; 62 kB)
  15. OpenCL GPGPU performance OpenCL vs. CUDA / STREAM (November 1, 2009)
  16. CUDA GPUs. June 4, 2012, accessed November 15, 2019 .
  17. CUDA Toolkit Archive. July 30, 2013, accessed April 2, 2019 .
  18. Hassan Mujtaba: NVIDIA Pascal and Volta GPUs Now Supported By Latest GeForce 358.66 Drivers - Also Adds Preliminary Support For Vulkan API. In: Wccftech. November 4, 2015. Retrieved April 2, 2019 (American English).
  19. NVIDIA GV10B GPU Specs. Retrieved April 2, 2019 .
  20. CUDA Toolkit Archive .
  21. CUDA compute capability requirements. Retrieved April 2, 2019 .