Schönhage-Strassen algorithm

from Wikipedia, the free encyclopedia

The Schönhage-Strassen algorithm is an algorithm for multiplying two n -digit whole numbers . It was developed in 1971 by Arnold Schönhage and Volker Strassen . The algorithm is based on a very fast variant of the discrete, fast Fourier transformation as well as a clever change between the remainder class and cyclic arithmetic in finite number rings.

The Schönhage-Strassen algorithm terminates in (see Landau notation ) if the bit complexity on multi-volume Turing machines , i.e. the maximum runtime of the algorithm measured as required bit operations depending on the bit length of the input variables, is selected as the efficiency measure . This complexity represents an improvement both compared to the naive runtime algorithm known from school and compared to the Karatsuba algorithm developed in 1962 with a runtime of and its improved variant, the Toom Cook algorithm with runtime.

The Schönhage-Strassen algorithm was the most efficient known algorithm for the multiplication of large numbers from 1971 to 2007; In 2007 Martin Fürer published a further development of the algorithm with the even lower asymptotic complexity , where the iterated logarithm of n is. By optimizing Fürer's algorithm, David Harvey, Joris van der Hoeven and Grégoire Lecerf achieved a further improvement in the asymptotic running time .

meaning

Until 2007, no more efficient algorithm could be found. In the general case, there is only the (trivial) linear running time as the lower bound, which the algorithm approaches with increasing number length. However, the researchers have found indications that the limit can never be undercut. Even with modern computers , this method of calculation is only more efficient than the Karatsuba algorithm for numbers with several thousand digits. This is probably less due to the overhead of the Schönhage-Strassen algorithm, but rather to the design optimization of computer processors that has been typical for decades , which gives preference to achieving fast floating point operations over arithmetic in finite residual class rings of whole numbers.

The Schönhage-Strassen algorithm is of central importance for the search for the algorithms with the best (time) complexity in computer algebra .

algorithm

Basic idea and terminology

The Schönhage – Strassen algorithm is based on the fast discrete Fourier transformation (DFT). This example shows the calculation of 1234 × 5678 = 7006652. The calculation takes place modulo 337. In order to improve the clarity, base 10 is used instead of base 2.

To multiply two whole numbers and , roughly the following scheme is used:

  1. Splitting the numbers (in binary representation) and into pieces of suitable length
  2. Fast discrete Fourier transformation (DFT) of the two piece sequences
  3. Component-wise multiplication of the transformed pieces
  4. Inverse transformation (inverse Fourier transformation) of the results
  5. Combining the result pieces to form the result number

The small multiplications to be carried out in the middle step are in turn carried out in a recursive sense by the Schönhage-Strassen algorithm.

To understand why the result is the product of the numbers a and b, consider the polynomials

and

If you put in , you get the binary representation of the numbers a and b. To be calculated is for the product polynomial

We determine the Fourier transform of the coefficient tuples of A and B:

For
For

In other words, you evaluate the two polynomials at the points . If these function values ​​are now multiplied, the corresponding function values ​​of the product polynomial result

.

To get the polynomial itself we need to undo the transformation:

For
For
For

After defining the roots of unity applies . This satisfies the following identity of geometric sums of roots of unity:

For

because

For

Thus:

For

In the article Discrete Fourier Transformation , the mathematical basics of this transformation are detailed. Since the transformation results in sums with terms in each case , we still have a quadratic running time in a classic calculation of the terms ( e.g. using the Horner scheme ). These values ​​can be calculated faster using the fast Fourier transformation . This calculation is based on the following divide-and-conquer principle:

Partial solutions are put together using simple operations (addition and simple multiplication). This allows the transformations to be calculated in time . However, rounding the complex roots of units to a fixed length leads to calculation errors. In order to compensate for this, at least bits must be calculated for a resulting bit . This results in a total running time of . With the Schönhage-Strassen variant, we calculate in a remainder class ring instead and thus avoid the calculation errors of the complex numbers.

Furthermore, the multiplication is not a pure convolution , but it can also lead to carryovers; after performing the FT and iFT, they must be treated appropriately.

The task of multiplying two whole numbers is now concretized as follows:

Let the two numbers to be multiplied be given in binary representation. Next is the maximum length (i.e. number of binary digits) of the two numbers.

After appropriate treatment of the signs of the two numbers as well as the trivial special cases and (which can be done with linear effort ) one can assume that numbers are natural. The Schönhage-Strassen algorithm solves this task in .

Theoretical preparation

Super fast DFT

The super-fast DFT mentioned above, which is the core of the algorithm, needs to be explained in more detail, as it is used here very specifically.

Let it be a commutative unitary ring . In let the element be a unit ; continue to be a th root of unity (thus ) which fulfills equality . Then the computation of the discrete Fourier transform (DFT) in the product space (this is a short notation for ; the term vector space is only used here for the case that there is a body ) can be carried out in a fast variant (as FFT) as follows:

Calculate for the transformed with

for .

By writing down the indices and in binary representation, doing this in reverse order for the number , the transform can be optimally calculated as follows:

Be there

For

and

.

The closed representation for these intermediate terms is

.

(To recalculate this representation, note ).

This recursion provides the desired Fourier coefficients .

Due to the property , we can transform the recursion step to be somewhat more computationally friendly

and

with the same exponent .

The inverse transformation, i.e. the inverse FFT, succeeds because we have assumed that it is invertible in the ring :

such as

,

where again is.

When used in the Schönhage-Strassen algorithm, only a halved FFT is actually required; This means the following: Let us start with the calculation in the first step of the recursion

only for and if we also restrict the further steps of the recursion to , we calculate all of them for odd values . Conversely, if one only wants to recover the differences from the original ones for odd (these are pieces) , then the recursion halved in the reverse direction is sufficient.

In the Schönhage-Strassen algorithm, the described fast Fourier transformation is required for finite number rings with Fermat numbers .

Note on the notation: For the remainder class ring we use the shorter notation , which could only lead to confusion in the context of the p-adic numbers .

The number (or, depending on the context, a suitable power of 2) will be used as the root of unity in the ring . The multiplications to be carried out with the FFT algorithm are then of the form ; however, they cannot be carried out as pure shift operations, since the reduction of a larger intermediate result has to be postponed modulo . This is where one of the brilliant ideas from Schönhage and Strassen comes into play : They embed the ring (equipped with the residual class arithmetic ) in a larger ring equipped with the cyclic arithmetic . This overring has a power of 2 as order, so that the corresponding multiplication can actually be carried out in it as a pure shift operation. This trick can be summarized in a nice structural theorem about residual class and cyclic arithmetic in finite number rings.

Structural theorem about cyclic arithmetic

The structural theorem about cyclic arithmetic can be formally expressed as follows:

For a power of two with a natural number, the following applies

.

Here referred by the representatives displayable residue classes modulo equipped with the residual arithmetic, d. H. with addition and multiplication modulo . The numbers occurring in this remainder class ring can be represented with binary digits.

The structure on the right-hand side denotes the remainder classes modulo of the number , which, however, are not equipped with the remainder class arithmetic, but instead with the cyclic arithmetic. In the case of intermediate results that are too large, carry-overs are canceled and added to the final result. In the binary digit representation, this corresponds to a shift of the excess binary digits (right-aligned at the lowest digit positions) with subsequent addition. For example, adding with does not give the value , but the value . From the number structure obtained in this way with cyclic arithmetic, the factor ring is now formed modulo . So the end results are reduced modulo .

This structural theorem thus means the following: The modulo calculation in can also be replaced by the cyclic calculation in the larger number space with subsequent reduction modulo .

The decisive factor for the success of the embedding presented in this structural theorem is the property that the largest representable number in the cyclic number space (here this is the number ) represents the number from the remainder class ring . The condition is necessary for this. On the other hand, for cyclic arithmetic to be meaningfully defined, it must be a power of two. Together it results that the optimal choice for the size of the cyclic embedding space is.

The classic residual class ring , on the other hand, would not be suitable for embedding, because in this ring the following applies : H. the number in this ring is a zero divisor .

execution

If we have the numbers to be multiplied with binary digits, we carry out different recursion steps, depending on whether it is even or odd, in order to log the number of digits in a single step:

Recursion step for odd m

We carry out this step of tracing back from to with the complexity .

It is to be multiplied by and the Fermat number . In this step we will carry out the return to the Fermat number .

We use the abbreviations for the powers of two belonging to the two Fermat numbers

and

a. Halving the number of digits will become our denomination size; H. we develop and according to potencies of :

and ,

where applies to the individual pieces . In binary representation, this breakdown corresponds to a simple grouping of the bit sequences into pieces of bits length .

A small weakness of the algorithm (which, however, does not detract from the complexity limit reached) is now revealed. In order to be able to use the super-fast DFT on the piece sequences and , these must be padded with zeros to the next power of two length; the number representation is thus artificially extended to

and .

By virtue of the structural theorem on cyclic arithmetic mentioned above, we now switch from the remainder class ring to the quotient space with cyclic arithmetic. In this space is calculated for the multiplication problem

,

where in the last step we used the property in this cyclic number space.

In summary, the multiplication gets the form

with the result coefficients

.

We can estimate upwards.

The sum formula is now described so that we can limit the FFT to be applied to a halved FFT.

It applies , therefore is

with in . With suitable addition we can shift the range of values ​​into the positive, namely it is , and with the definition

applies

.

The estimate applies to the non-trivial ones (indices to ) . Since the two numbers and are relatively prime, it is sufficient to calculate the remainders and to determine the .

If you have determined the remainder and , you can calculate in complexity as follows: Calculate first and then .

Determination of the remainders modulo 2 n + 2

Here we use a trick that is very typical for computer algebra: We put the piece sequences and insert sufficiently long zero sequences with safety margins so that, after the product has been formed, the individual results are also lined up in pieces without overlapping. So be and in . We are now forming

and

and have with us . The product then contains the sums in disjoint pieces of the bit length

with because it is . For the terms of our original multiplication problem we see

.

For the remnants to be determined we get

in .

The complexity involved in forming all as well as extracting the is ; multiplication costs , so this is total .

Determination of the remainders modulo (D + 1)

This is where the DFT comes in. We subject the vectors and with the DFT in with and the number as the -th root of unity. Since we only need the differences , the halved DFT is sufficient:

  • DFT to determine the and only for the odd with
  • Multiplications for all odd
  • Inverse DFT to get all the differences from the odd

The complexity outlay for this consists of steps of the individual outlay for the DFT (ie total ); Added to this are the addition of and the reductions modulo for the production of what can be mastered in.

Recursion step for even m

The complexity is also achieved for this step of tracing back from to .

It is to be multiplied by and the Fermat number . In this step, too, we will return to the Fermat number .

We use the same abbreviations for the powers of two belonging to the two Fermat numbers

and

a. Again, the halved number of digits will become our denomination size; H. we develop and according to potencies of :

and ,

where applies to the individual pieces .

As above, we extend the number representation to a power of two

and analog for .

With the help of the structural theorem for cyclic arithmetic, we now switch from the remainder class ring to the quotient space with cyclic arithmetic.

We can do that again

with the result coefficients

represent. We can estimate upwards.

For we can again

infer, and with

applies

with . The estimate applies to the non-trivial ones (indices to ) . Because of the coprime nature of the two numbers and it is again sufficient to determine the , the remainders and to calculate.

Determination of the remainder modulo 2 n + 1

We use the trick of inserting safety margins again: Let it be and in . We educate

and

and have with us . The product then contains the sums in disjoint pieces of the bit length

with . For our original multiplication problem we are looking for

.

For the remnants to be determined we get

in .
Determination of the remainders modulo (D + 1)

With we subject the vectors again and with the DFT in , this time choosing the number as the -th root of unity. Since we only need the differences , the halved DFT is sufficient here:

  • DFT to determine the and only for the odd with
  • Multiplications for all odd
  • Inverse DFT to get all the differences from the odd

Summary

Starting with and with digit length , the recursion shown achieves a complexity of .

Modified form

Zimmermann and Brent describe a variant of the algorithm in which the runtime (depending on the length of the input) does not make jumps, but rather runs more steadily. This is achieved in that the DFT vectors are not formed from -digit binary numbers, but numbers of the appropriate length. This means that the length of the vectors to be transformed does not have to be a power of two.

literature

Web links

Individual evidence

  1. ^ Arnold Schönhage, Volker Strassen : Fast multiplication of large numbers . In: Computing , 7, 1971, pp. 281-292, Springer Verlag
  2. Martin Fürer: Faster integer multiplication . ( Memento of the original from April 25, 2013 in the Internet Archive ; PDF; 295 kB) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. STOC 2007 Proceedings, pp. 57-66. @1@ 2Template: Webachiv / IABot / www.cse.psu.edu
  3. ^ David Harvey, Joris van der Hoeven, Grégoire Lecerf: Even faster integer multiplication . 2014, arxiv : 1407.3360
  4. loria.fr (PDF) p. 56
  5. loria.fr (PDF)