HyperTransport

HyperTransport ( HT ) is a high-speed, bi-directional link between multiple integrated circuits that emerged from an AMD project called Lightning Data Transport ( LDT ).

HT is an open industry standard with no license fees. The development and standardization is carried out by the manufacturer-independent HT consortium, to which many well-known companies such as AMD , Nvidia , IBM and Apple belong.

Technical

Building a link

This connection is referred to as a "link" and consists of two (individual) unidirectional point-to-point connections. Even at bus cycle times, every single piece of information or every single signal was enabled and forwarded. HT technology enables the signals to be forwarded directly through direct device or chip connections (links).

With HT, each of the two unidirectional point-to-point connections of a link is 2, 4, 8, 16 or 32 bits wide. Different link widths for both directions are explicitly allowed. HT also knows a range of different clock frequencies with which the link can be operated. It is not necessary that every device support all clock frequencies, but all devices must support the minimum of 200 MHz. The data is transmitted using the DDR method, so that the effective data rate is doubled.

Electrical details

The electrical interface of HT uses differential signal pairs with low voltage (1.2 volts ± 5%) to transmit the data . According to the HT specification, the termination must take place "on-die" at the receiver and with 100 ohm impedance .

Further electrical lines per direction of a link are:

For every 8 bit data width there is a clock line which runs from the transmitter to the receiver, with which the data on the 8 data lines at the receiver are scanned (source-synchronous clocking).
A line that indicates whether the current packet is a control packet or not.
A line that shows whether the current and clock are stable.
One line for the reset.

Two additional lines are required for x86 systems :

one for switching the link on and off for the duration of a clock frequency change (with Cool'n'Quiet or SpeedStep )
shows whether the link is active

Device classes

A distinction is made between three device classes with regard to their function and location within the HT chain. These include caves, tunnels and bridges.

The HT Bridge is the mediator between the primary side, with CPU and memory, and the secondary side with the HT devices (the chain). A tunnel has two sides, each with a receiving and a sending unit. The tunnel can e.g. B. be a network card or a bridge to another protocol. A cave marks the end of the chain and has only one communication side. A simple HT chain can be set up by interconnecting at least one bridge and one cave.

HT packets

HT works on a packet basis with a packet size that is a multiple of 4 bytes. There are two types of packets: data and control packets. The latter are 4 or 8 bytes in size, while the size of data packets varies from 4 to 64 bytes (in 4-byte steps). If less than 4 bytes of data have to be transmitted, the rest of the packet is filled with any bits.

Control packets can be of three types: info, request and response packets. Info packets are always 4 bytes long. These are only used for communication with the opposite device of a link and are therefore not cached or routed. Info packets represent the lowest level of the HT protocol and are used for information about flow control , for link synchronization and for troubleshooting.

Request packets with an address are 8 bytes long, otherwise 4. The address is 40 bits long. An optional feature is the transmission of 64-bit addresses in request packets. This is negotiated separately between the two participants for each link. In this case, the size of a request packet increases to 12 bytes.

Different types of requirements are possible, such as normal reading or writing of data, atomic read-modify-write, broadcast and many more. Response packets are always 4 bytes in size and only serve to inform that a previously sent request has been completed. Any data (e.g. in the case of a read request) are not sent directly in the response packet, but follow this directly afterwards in the form of one or more data packets.

Performance data

With HT, links with different widths and clock frequencies are possible, so that more or less data is transmitted accordingly. With a 32-bit link width and 1.4 GHz clock frequency, a gross 11.2 GByte / s is possible in each direction. Since, in addition to the user data, the packets also contain addresses and control information, a certain overhead must be deducted from this so that the actually usable bandwidth is a little lower.

In comparison with other technical standards such as PCI Express or Rapid I / O , HT can be seen well. In the case of HT, the ability to emphasize in particular that urgent control packets (e.g. requests) can be inserted between individual data packets at any time, even if the bandwidth of the link is already fully used. ( Priority Request Interleaving )

This ability, together with the high clock rate, makes HT a connection solution with very low latency times, which is particularly important for applications in the HPC area.

Specifications

HyperTransport version	date	Max. HT clock frequency	Max. HT link width per direction	Max. Bandwidth per direction	Total bandwidth (max.)
1.0	February 2001	800 MHz	32 bit	6.4 GB / s	12.8 GB / s
1.1	2002	800 MHz	32 bit	6.4 GB / s	12.8 GB / s
2.0	February 2004	1400 MHz	32 bit	11.2 GB / s	22.4 GB / s
3.0	April 2006	2600 MHz	32 bit	20.8 GB / s	41.6 GB / s
3.1	August 2008	3200 MHz	32 bit	25.6 GB / s	51.2 GB / s

Minimum HT clock frequency: 200 MHz
Minimum HT link width per direction: 2 bits

application areas

Backplane replacement

The main area of application for HyperTransport will be the replacement of backplane buses, which are currently different for almost every processor generation.

In order to be able to expand a traditional system, the backplane must have interfaces to other standard interfaces such as e.g. B. AGP or PCI . These are usually integrated into a controller called the Northbridge . A computer with HyperTransport is flexible. A single HyperTransport↔PCI adapter chip works in every machine and allows the PCI cards to communicate with every CPU. AMD replaces the processor backplane in the AMD64 family.

HyperTransport could also replace the backplane in routers and switches . These devices have multiple ports and the incoming data must be forwarded between the corresponding ports as quickly as possible. An Ethernet switch with 4 ports, for example, requires a backplane with 800 Mbit / s (100 Mbit / s × 4 ports × 2 directions). HyperTransport could be used to build switches that consist of four HyperTransport↔Ethernet converters and use HT as a backplane.

Multiprocessor connections

Another area of application for HyperTransport is the connection of NUMA processors in multiprocessor systems, as is currently practiced in AMD's Opteron series.

Under the name "Torrenza", AMD runs an initiative to integrate special processors such as B. physics accelerators or SIMD CPUs in the computer architecture. Although not strictly bound to HyperTransport, it is a preferred technique for connecting such processors as part of Torrenza. In fact, there are plans to develop coprocessors that are socket-compatible with the AMD Opteron.

HTX

HTX slot with 2 PCIe slots underneath

With HTX, a plug-in card standard has also been defined from version 2.0, so that it is now possible to produce individual cards directly with a native HT interface. Compared to a solution in which a different bus standard (e.g. PCI-X ) is used between the card and the HT-Link, the effective latency is reduced again; z. B. Pathscale offers an InfiniBand adapter with HTX connector and advertises with very low latencies (1.29 µs MPI latency).

Individual evidence

↑ hypertransport.org

Web links

[1] ypertransport.org