Barrel shifter
A barrel shifter is a digital technology circuit that shifts or rotates bit vectors (data words) quickly and in constant time by any number of bits. This property fundamentally distinguishes them from classic shift registers, the execution time of which depends on the shift distance and can take considerable times, especially for large shift distances.
In contrast to shift registers , which consist of clocked flip-flops (e.g. D flip-flops), a barrel shifter primarily has no storage elements. If necessary, these can be connected upstream and / or downstream of a barrel shifter. If the bits within the barrel shifter are connected cyclically with one another, one sometimes speaks of a barrel rotator.
General
A barrel shifter shifts several bits asynchronously, the time required being determined by the gate delay time of the switching network . Depending on the (synchronous) environment in which the barrel shifter is embedded, sometimes no additional clock cycle is required for the shift operation.
This is the fundamental difference to a normal shift register, which carries out a shift by n bits sequentially (and mostly clocked) by shifting n by 1 bit.
A 4-bit wide barrel rotator as the simplest form has four data inputs, two control inputs and four data outputs. If the sequence "ABCD" is present at the inputs, the following output sequences can be generated at the two control inputs as a function of the four possible states: ABCD , DABC , CDAB or BCDA . In the graphic opposite, these four possible switching states are shown in four colors in the control logic: The decoder only ever activates one of the four colored lines and thus generates one of the four output patterns.
The barrel shifter is often part of microprocessors . This logic function can also be implemented in a programmable logic module (PLD), an FPGA or an ASIC as part of an overall circuit.
realization
Barrel shifters can be implemented in different ways. Shifting an N-bit word by any shift distance between 0 and N − 1 can be implemented with a 1-out-of-N decoder and N × N-out of 1 multiplexers with one multiplexer pass. However, N can also be broken down into factors (preferably powers of two and identical), e.g. B. N = √N · √N or N = 2 n and then
- with two 1-out-√N decoders and N2 × √N-out-1 multiplexers in two steps,
- with n 1-out-of-2 decoders and N · n 2-out-of-1 multiplexers in n steps
or more generally with N = m a · n b and
- with a 1-of-m decoders, b 1-of-n decoders, N · a m-of-1 multiplexers and N · b n-of-1 multiplexers
carry out.
The effort for these different implementations is different, there are also differences in the runtime if NANDs with 4 or 8 inputs are available for implementation. But all circuits have constant and short throughput times in the range of fewer gate delay times, usually below one clock cycle.
oben mitte unten NAND 1 Eingang 4 4 4 NAND 2 Eingänge - - 192 NAND 3 Eingänge 320 128 - NAND 4 Eingänge 80 32 - FETs 2568 1032 776 Laufzeit Input 4 4 8 Gatterlaufzeiten Laufzeit Input 80ps 80ps 160ps FO4=20ps Laufzeit Shift 5 5 9 Gatterlaufzeiten Laufzeit Shift 120ps 120ps 200ps FO4=20ps
Realization with N × N-out-of-1 multiplexers
The N different shift operations of the N-bit long input bit vector are mapped as an N × N matrix. The shift distance, which is present as a log 2 N bit value, is decoded with a 1-out-of-N decoder and selects a specific input of all N-out-of-1 multiplexers.
See graphic on the right, upper example:
A shift by 13 activates column n 13 , which rotates the data word by 13 to the left in one step.
Division into N * n * 2-of-1 multiplexers
Here one uses the property that is already used in shift registers: Shift operations can be separated. The following applies:
Therefore, the large N × N matrix can be broken down into n 2 × N sub-matrices. At first glance, this implementation looks slower than the first, but this only applies if the N-out-of-1 multiplexers from the first implementation do not have to be implemented as cascaded 2-out-of-1 multiplexers anyway.
See the graphic on the right, lower example:
A shift by 13 activates the columns s 3 , s 2 , ¬s 1 and s 0 , which in four steps increase the data word by 1 · 2 3 + 1 · 2 2 + 0 · 2 1 + 1 Rotate 2 0 = 13 to the left.
Division into N * n / 2 × 4-out-of-1 multiplexers
If 4-out-of-1 multiplexers can be built efficiently, this implementation is more efficient than the second.
See graphic on the right, middle example:
A shift by 13 activates the columns m 3 and n 1 , which rotate the data word to the left by 3 · 4 1 + 1 · 4 0 = 13 in two steps .
Realization with hardware multipliers
Another implementation option for shifting to the left is multiplication by a power of two. In particular, if dedicated hardware multipliers in an FPGA would otherwise be idle, barrel shifters can be efficiently implemented without the need for universally usable FPGA resources.
There are also barrel shifters as individual integrated circuits , such as the SN74AS897 component , which offers an 8-bit wide barrel shifter.
Individual evidence
- ↑ Implementing Barrel Shifters Using Multipliers (PDF; 50 kB), Xilinx Application Note, Paul Gigliotti, 2004 (Engl.)