Floating Point Operations Per Second

from Wikipedia, the free encyclopedia
Units of floating point computing power with prefixes according to SI .
kFLOPS Kilo FLOPS = 10 3 FLOPS
MFLOPS Mega FLOPS = 10 6 FLOPS
GFLOPS Giga FLOPS = 10 9 FLOPS
TFLOPS Tera FLOPS = 10 12 FLOPS
PFLOPS Peta FLOPS = 10 15 FLOPS
EFLOPS Exa FLOPS = 10 18 FLOPS
ZFLOPS Zetta FLOPS = 10 21 FLOPS
YFLOPS Yotta FLOPS = 10 24 FLOPS

Floating point operations per second (short FLOPS ; English for flops ) is a measure of the performance of computers or processors and is the number of floating point operations ( additions or multiplications ) that can be executed by them per second.

Frequently one is called FLOP floating point operation (English fl oating-point op eration ) referred, which occasionally also the variant FLOP / s appears, however, both variants are equivalent.

description

The number of floating point operations is not necessarily directly proportional to the clock speed of the processor , since - depending on the implementation - floating point operations require a different number of clock cycles . Vector processors perform up to several thousand operations in each cycle. Current graphics cards that work as vector processors achieve computing power with single precision (SP, 32Bit Float) over 11 TeraFLOPS (Nvidia GeForce GTX 1080 Ti) or over 13 TeraFLOPS (AMD Radeon RX Vega 64 Liquid), which is also the motivation for that Floating-point number operation is outsourced to the graphics processor ( GPGPU ).

The entire computer architecture, consisting of main memory , bus and compiler , is measured by the FLOPS , not the pure processor speed. As with the IPS unit , a best-case estimate or even a theoretically possible value is usually given.

calculation

Theoretical peak performance of a single computing node can be calculated by multiplying the following values:

  • Clock frequency
  • Number of CPU sockets
  • CPU cores per socket
  • virtual cores per CPU core
  • min(Commands that can be started per cycle ,Number of arithmetic /units Latency of a command)
  • Data words per calculation register
  • numeric operations per command

For

  • 2.5 GHz
  • 2 bases
  • 24 cores
  • 2 virtual cores per CPU core (hyperthreading)
  • 2 started commands per cycle
  • 8 data words per calculation register (256-bit register with single or 512-bit register with double precision)
  • 2 numeric operations per command (FMA)

7.68 TFLOPS are obtained.

Computing power of computer systems

The first freely programmable computer that could be used in practice, the electromechanical Zuse Z3 from 1941, managed just under two additions per second and thus 2 FLOPS. However, other operations sometimes took much longer.

The FLOPS of a computer are determined by defined program packages ( benchmarks , such as LINPACK or Livermore benchmark ).

The TOP500 ranking list shows the 500 fastest computer systems, measured by their FLOPS with the LINPACK benchmark.

The approximately 700,000 active computers at Berkeley Open Infrastructure for Network Computing achieved an average performance of around 12 Peta FLOPS in December 2015. However, Intel is currently building the Aurora, based on the Intel Xeon Phi architecture of the Knights Hill type , in the state of Illinois . The $ 200 million supercomputer is expected to have a computing power of over 180 PetaFLOPS when it is completed in 2018. Later expansions of the data center should even make over 450 PetaFLOPS possible.

The correlator of the Atacama Large Millimeter / submillimeter Array (ALMA) performed 17 PetaFLOPS in December 2012, while the computing power of the WIDAR correlator on the Expanded Very Large Array (EVLA) is stated as 40 PetaFLOPS. The planned correlator of the Square Kilometer Array (SKA) should be able to perform 4 ExaFLOPS (4000 PetaFLOPS).

The ratio of computing power to the demand for electrical power improves, with the total energy supply increasing. The BlueGene / L from IBM , which was in the TOP500 list 11/2005, required only 70 m² of space and 1770  kW of electrical power for its output of around 280 Tera FLOPS  , which is compared to the Earth Simulator three years older (35.86 TeraFLOPS) with 3000 m² and 6000 kW represents a significant improvement.

Another example: The fastest computer in Germany in July 2005, a 57 million euro NEC with 576 main processors at the high-performance computing center Stuttgart (HLRS), achieved up to 12.7 TeraFLOPS, and was optimistic as 5000 times faster than a “normal one “PC called. The operator put the operating costs (excluding acquisition) at 1.3 million euros per year and 1.5 million euros in personnel costs. Because of the high acquisition costs, such a powerful system was rented out at an hourly rate of approx. 4000 euros for the entire system (members of the University of Stuttgart , however, paid a significantly lower price).

A comparison: In March 2006 the newest “fastest” computer in Germany was put into operation in Jülich, the JUBL (Jülich Blue Gene / L ). At 45.6 TeraFLOPS, it was the sixth fastest computer in the world to offer the computing power of 15,000 "normal" contemporary PCs. The assessment of the development of computing time requirements by the CEO of the Jülich Research Center (March 2006) is of interest for further developments: "The demand for computing time will increase by a factor of 1000 over the next five years."

To be able to classify the performance: The Intel 8087 coprocessor with an 8088 as the main processor from 1980 managed 50 kFLOPS. At the beginning of the 21st century, a PC with a Pentium 4 processor at a clock frequency of three gigahertz achieved around six GigaFLOPS, according to IBM. A current (as of November 2017) conventional graphics card (NVIDIA Geforce GTX 1080 Ti) performs around 11.5 TeraFLOPS.

Examples of the GFLOPS values ​​on some CPUs
LINPACK 1kx1k (DP) Maximum performance
(in GFLOPS)
Average
performance (in GFLOPS)
Efficiency
(in%)
Cell , 1 SPU, 3.2 GHz 1.83 1.45 79.23
Cell, 8 SPUs, 3.2 GHz 14.63 9.46 64.66
Pentium 4 , 3.2 GHz 6.4 3.1 48.44
Pentium 4 + SSE3, 3.6 GHz 14.4 7.2 50.00
Core i7 , 3.2 GHz, 4 cores 51.2 33.0 ( HT enabled) 64.45
Core i7 , 3.47 GHz, 6 cores 83.2
Core i7 Sandy Bridge , 3.4 GHz, 4 cores 102.5 92.3 90.05
Itanium , 1.6 GHz 6.4 5.95 92.97
Nvidia Tesla GP100 , 1.48 GHz 10600
Nvidia Quadro P6000 19553 12901
Xeon Skylake SP 6148 1536
AMD Ryzen 1800X, 8K / 16T, not yet optimized 221
Intel Core i7-7700K , 4K / 8T 241
Intel i7-5960X, 8K / 16T 375

See also

Individual evidence

  1. Floating point operations per second (flops). In: Glossary entry at heise online ; As of November 8, 2010
  2. Nvidia GeForce GTX 1080 Ti graphics card tested: maximum performance for 4K gaming, HDR and virtual reality In: heise.de , accessed on August 15, 2017
  3. AMD Radeon RX Vega 64 and Vega RX 56 in the test: Hot Vega? In: pcgameshardware.de , accessed on June 8, 2018
  4. ^ Berkeley Open Infrastructure for Network Computing. In: boinc.berkeley.edu
  5. Marc Sauter: Over 180 petaflops: Intel's Aurora becomes the world's fastest supercomputer. In: golem.de. April 10, 2015, accessed October 24, 2017 .
  6. Intel Aurora: Fastest supercomputer with 180 petaflops performance. In: computerbase.de , accessed on April 13, 2015
  7. Powerful Supercomputer Makes ALMA a Telescope
  8. ↑ The highest supercomputer in the world compares astronomy data. In: Heise online
  9. a b National Radio Astronomy Observatory: Cross-Correlators & New Correlators - Implementation & choice of architecture p. 27 (PDF; 9.4 MB)
  10. National Radio Astronomy Observatory: The Expanded Very Large Array Project - The 'WIDAR' Correlator p. 10 (PDF; 13.2 MB)
  11. High-Performance Computing Center University of Stuttgart / Systems
  12. In Swabia, calculations are extremely fast. In: Stern.de , July 22, 2005, accessed January 17, 2014.
  13. Fee schedule for the use of the computing systems of the high-performance computing center Stuttgart (HLRS) ( Memento of the original from October 23, 2013 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (PDF; 105 kB)  @1@ 2Template: Webachiv / IABot / www.hlrs.de
  14. Coprocessor.info - x87 info you need to know! September 30, 2011, accessed August 14, 2019 .
  15. ^ IBM: Cell Broadband Engine Architecture and its first implementation
  16. tecchannel.de
  17. a b c Anon: Ryzen 1800X linpack results. In: https://i.imgur.com/RDvvhN0.png . reddit.com, February 27, 2017, accessed December 27, 2017 .