Duff's device

from Wikipedia, the free encyclopedia

Duff's device is an after its inventor Tom Duff called programming method for skillful unrolling loops ( English loop unrolling ) in the programming language C . In a code-saving manner, it solves the problem that the number of loop runs may not be a multiple of the n-fold unrolling of the loop or the number of runs is variable and only results at runtime. Duff's Device can be used in any programming language that allows entries into the loop body.

functionality

Duff worked for the film production company Lucasfilm in 1983 and discovered that the following function for outputting image data to special hardware was too slow for his application:

send(to, from, count)
register short *to, *from;
register count;
{
    do
        *to = *from++;
    while (--count>0);
}

Duff was annoyed that this implementation was spending a lot of time checking the loop condition. The standard way to speed up loops is to scroll them. However, a partial pass is left over. From assembler programming, Duff knew the possibility of jumping directly into the loop. In the C programming language, this can be solved using the switch statement:

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count+7)/8;
    switch (count%8) {
        case 0:	do {    *to = *from++;
        case 7:         *to = *from++;
        case 6:         *to = *from++;
        case 5:         *to = *from++;
        case 4:         *to = *from++;
        case 3:         *to = *from++;
        case 2:         *to = *from++;
        case 1:         *to = *from++;
        } while (--n>0);
    }
}

Explanation

When called, the function receives the addresses of an output register ( *to) and an array in the memory ( *from) and the number of data to be transferred ( shortusually stands for two bytes). It uses a loop unrolling in which eight elements are rolled out at a time.

The counter variable ncontains the number of loop passes ( count/8, rounded up). If it is count not a multiple of eight, part of the loop is skipped over the first time through the switch instruction and only Rest(count div 8)elements are copied. From the second pass , the number of elements still to be copied is then a multiple of eight. The entire loop body is then passed through and eight elements are processed in one go.

disadvantage

On modern processors, this method does not necessarily lead to an increase in efficiency, since it can have a negative effect on cache behavior. Modern compilers themselves use methods for running programming loops, taking into account the respective target platform. In addition, frequent use of the method can make the code extremely large.

literature

Individual evidence

  1. a b Tom Duff on Duff's Device on lysator.liu.se (English)
  2. Entry Duff's Device in the Jargon File on catb.org (English)
  3. Theodore Ts'o on XFree86 and performance in the Linux Kernel Archive on lkml.indiana.edu (English)