Tensor Processing Unit
Tensor Processing Units ( TPUs ), also known as tensor processors , are application-specific chips to accelerate applications in the context of machine learning . TPUs are mainly used to transfer data in artificial neural networks , cf. Deep learning to process.
The TPUs developed by Google were specially designed for the TensorFlow software collection . TPUs are the basis for all Google services that use machine learning and were also used in the AlphaGo machine-versus-human competitions with one of the world's best Go players, Lee Sedol .
First generation
The first generation of Google's TPU was presented at Google I / O 2016 and was specially designed to support or accelerate the use of an already trained artificial neural network . This was u. a. achieved by a lower precision compared to normal CPUs or GPUs and a specialization in matrix operations.
The TPU consists of a systolic array with a 256 × 256 8-bit matrix multiplication unit (MMU), which is controlled by a microprocessor with a CISC instruction set. The chip was manufactured in a 28 nm process and clocks at 700 MHz with a TDP of 28 to 40 W. The TPU has 28 MiB of RAM on the chip. In addition, 4-MiB 32-bit accumulators are installed, which take over the results of the matrix multiplication unit. The TPU can perform matrix multiplications , convolution and activation functions , as well as data transfer to the host system via PCIe 3.0 or to the DDR3 DRAM, which is located on the board.
Second generation
The second generation of Google's TPU ( TPUv2 ) was presented at Google I / O 2017 . This should not only accelerate the use of neural networks ( inference ), but also the training of these networks. These TPUs have two "Matrizenausführungseinheiten" ( Matrix Execution Unit ; MXU ) with 8 GiB of RAM. Each MXU has a computing power of 22.5 TFLOPS , although the bfloat16 data type is used, which does not comply with IEEE 754 . A TPU board with 4 TPUs thus comes to 180 TFLOPS.
The TPUs are interconnected to form a “pod” with 11.5 PFLOPS , a computer network (cluster system architecture ) of 256 TPUs and 128 server CPUs. The TPUs are interconnected in a spherical (2D torus) network topology of 8 × 8 TPUs each. PCI-Express 3.0 with 32 lanes (8 lanes per TPU) is used to connect the CPUs to the TPUs .
The second generation TPUs can be used in the form of the Google Compute Engine , a cloud offering from Google.
HBM memory is used to increase the memory bandwidth of the architecture .
The chip area of the second generation should be larger than that of the first generation due to the more complex memory interface and the 2 cores per chip.
Third generation
The third generation of Google's TPU ( TPU 3.0 ) was presented at Google I / O 2018 . The TPUs have 4 MXUs with 8 GiB working memory each (32 GiB per TPU). The network topology of the TPUs is now designed in the form of a 3D torus . The racks also have water cooling , with which the TPUs are cooled. TPU 3.0 pods consist of 8 racks with a total of 1024 TPUs and 256 server CPUs. The computing power of a pod is just over 100 PFLOPS.
literature
- Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson: A domain-specific architecture for deep neural networks. In: Communications of the ACM. 61, 2018, p. 50, doi : 10.1145 / 3154484 .
Web links
- Cloud TPUs (TensorFlow @ O'Reilly AI Conference, San Francisco '18) on YouTube , October 25, 2018
- A simple classification model using Keras with Cloud TPUs. In: colab.research.google.com. Retrieved November 10, 2018 .
- Edge TPU devices. In: aiyprojects.withgoogle.com. Retrieved March 22, 2019 .
- Sebastian Grüner: Tensor Processing Unit: Google builds its own chips for machine learning. In: golem.de. May 19, 2016. Retrieved November 23, 2016 .
- Harald Bögeholz: Artificial Intelligence: Architecture and Performance of Google's KI Chip TPU - heise online. In: heise.de. April 6, 2017. Retrieved April 7, 2017 .
- David Patterson, Google TPU Team: In-Data Center Performance Analysis of a Tensor Processing Unit. (PDF) April 2, 2017, accessed on May 23, 2017 (English).
Patents
- Patent US20160342889 : Vector Computation Unit in Neural Network Processor. Registered on September 3, 2015 , published on November 24, 2016 , applicant: Google Inc. , inventor: Gregory Michael Thorson, Christopher Aaron Clark, Dan Luu.
- Patent WO2016186823 : Batch Processing in a Neural Network Processor. Registered on March 3, 2016 , published on November 24, 2016 , applicant: Google Inc. , inventor: Reginald Clifford Young.
- Patent WO2016186801 : Neural Network Processor. Registered on April 26, 2016 , published on November 24, 2016 , applicant: Google Inc. , inventor: Jonathan Ross, Norman Paul Jouppi, Andrew Everett Phelps, Reginald Clifford Young, Thomas Norrie, Gregory Michael Thorson, Dan Luu.
- Patent WO2014105865 : System and method for parallelizing convolutional neural networks. Filed December 23, 2013 , published July 3, 2014 , Applicant: Google Inc. , Inventors: Alexander Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton.
Individual evidence
- ^ Jeff Dean, Rajat Monga: TensorFlow - Google's latest machine learning system, open sourced for everyone. In: Google Research Blog. Google, November 9, 2015, accessed June 29, 2016 .
- ↑ Christof Windeck: Google I / O 2016: "Tensor processors" helped win the Go - heise online. In: heise.de. May 19, 2016. Retrieved November 23, 2016 .
- ↑ Norm Jouppi: Google supercharges machine learning tasks with TPU custom chip. In: Google Cloud Platform Blog. May 18, 2016. Retrieved June 29, 2016 (American English).
- ↑ a b c d e f Timothy Prickett Morgan: Tearing apart Google's TPU 3.0 AI Coprocessor. In: The Next Platform. May 10, 2018, accessed May 24, 2018 .
- ↑ System architecture | Cloud TPU. Retrieved January 12, 2020 .