Tensor Processing Unit

from Wikipedia, the free encyclopedia

Tensor Processing Units ( TPUs ), also known as tensor processors , are application-specific chips to accelerate applications in the context of machine learning . TPUs are mainly used to transfer data in artificial neural networks , cf. Deep learning to process.

The TPUs developed by Google were specially designed for the TensorFlow software collection . TPUs are the basis for all Google services that use machine learning and were also used in the AlphaGo machine-versus-human competitions with one of the world's best Go players, Lee Sedol .

First generation

The first generation of Google's TPU was presented at Google I / O 2016 and was specially designed to support or accelerate the use of an already trained artificial neural network . This was u. a. achieved by a lower precision compared to normal CPUs or GPUs and a specialization in matrix operations.

The TPU consists of a systolic array with a 256 × 256 8-bit matrix multiplication unit (MMU), which is controlled by a microprocessor with a CISC instruction set. The chip was manufactured in a 28 nm process and clocks at 700 MHz with a TDP of 28 to 40 W. The TPU has 28 MiB of RAM on the chip. In addition, 4-MiB 32-bit accumulators are installed, which take over the results of the matrix multiplication unit. The TPU can perform matrix multiplications , convolution and activation functions , as well as data transfer to the host system via PCIe 3.0 or to the DDR3 DRAM, which is located on the board.

Second generation

The second generation of Google's TPU ( TPUv2 ) was presented at Google I / O 2017 . This should not only accelerate the use of neural networks ( inference ), but also the training of these networks. These TPUs have two "Matrizenausführungseinheiten" ( Matrix Execution Unit ; MXU ) with 8 GiB of RAM. Each MXU has a computing power of 22.5 TFLOPS , although the bfloat16 data type is used, which does not comply with IEEE 754 . A TPU board with 4 TPUs thus comes to 180 TFLOPS.

The TPUs are interconnected to form a “pod” with 11.5 PFLOPS , a computer network (cluster system architecture ) of 256 TPUs and 128 server CPUs. The TPUs are interconnected in a spherical (2D torus) network topology of 8 × 8 TPUs each. PCI-Express 3.0 with 32 lanes (8 lanes per TPU) is used to connect the CPUs to the TPUs .

The second generation TPUs can be used in the form of the Google Compute Engine , a cloud offering from Google.

HBM memory is used to increase the memory bandwidth of the architecture .

The chip area of ​​the second generation should be larger than that of the first generation due to the more complex memory interface and the 2 cores per chip.

Third generation

TPUv3 card
TPUv3 card

The third generation of Google's TPU ( TPU 3.0 ) was presented at Google I / O 2018 . The TPUs have 4 MXUs with 8 GiB working memory each (32 GiB per TPU). The network topology of the TPUs is now designed in the form of a 3D torus . The racks also have water cooling , with which the TPUs are cooled. TPU 3.0 pods consist of 8 racks with a total of 1024 TPUs and 256 server CPUs. The computing power of a pod is just over 100 PFLOPS.


Web links


  • Patent US20160342889 : Vector Computation Unit in Neural Network Processor. Registered on September 3, 2015 , published on November 24, 2016 , applicant: Google Inc. , inventor: Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu.
  • Patent WO2016186823 : Batch Processing in a Neural Network Processor. Registered on March 3, 2016 , published on November 24, 2016 , applicant: Google Inc. , inventor: Reginald Clifford Young.
  • Patent WO2016186801 : Neural Network Processor. Registered on April 26, 2016 , published on November 24, 2016 , applicant: Google Inc. , inventor: Jonathan Ross, Norman Paul Jouppi, Andrew Everett Phelps, Reginald Clifford Young, Thomas Norrie, Gregory Michael Thorson, Dan Luu.
  • Patent WO2014105865 : System and method for parallelizing convolutional neural networks. Filed December 23, 2013 , published July 3, 2014 , Applicant: Google Inc. , Inventors: Alexander Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton.

Individual evidence

  1. ^ Jeff Dean, Rajat Monga: TensorFlow - Google's latest machine learning system, open sourced for everyone. In: Google Research Blog. Google, November 9, 2015, accessed June 29, 2016 .
  2. Christof Windeck: Google I / O 2016: "Tensor processors" helped win the Go - heise online. In: heise.de. May 19, 2016. Retrieved November 23, 2016 .
  3. Norm Jouppi: Google supercharges machine learning tasks with TPU custom chip. In: Google Cloud Platform Blog. May 18, 2016. Retrieved June 29, 2016 (American English).
  4. a b c d e f Timothy Prickett Morgan: Tearing apart Google's TPU 3.0 AI Coprocessor. In: The Next Platform. May 10, 2018, accessed May 24, 2018 .
  5. System architecture | Cloud TPU. Retrieved January 12, 2020 .