Square sieve

Square sieve is a term from the field of number theory in mathematics and describes one of the fastest known algorithms for factoring large natural numbers . It is a general method of factoring; H. the running time only depends on the size of the number to be factored and not on special properties of the number (or its divisors). It is the fastest (general) factorization method for numbers with up to approx. 100 decimal places. The number field sieve is faster for larger numbers . The running time to factorize a number n with the square sieve is on the order of

{\ displaystyle \ exp \ left ({\ sqrt {\ ln n \ cdot \ ln \ ln n}} \ right).}

History of origin

Based on the continued fraction method of John Brillhart and Michael Morrison and inspired by the linear sieve by Richard Schroeppel , Carl Pomerance invented the square sieve in 1981 through theoretical considerations, which was faster than all previously known factoring methods.

Shortly thereafter, James Davis and Diane Holdridge or Peter Montgomery independently found a variant of the quadratic sieve with multiple polynomials (called MPQS). Another improvement, called the special square sieve, was made by Mingzhi Zhang; but it can only be applied to specific numbers.

In 1994 it was possible to factorize the 129-digit number RSA-129 with the help of the square sieve .

More about the history of the square sieve can be found in the article History of Factoring Methods .

functionality

The square sieve is a further development of Dixon's factoring method . Like most modern factoring techniques, it uses the representation of a product as the difference of squares. Based on the 3rd binomial formula :

{\ displaystyle x ^ {2} -y ^ {2} = (x + y) \ cdot (xy) = n}

Instead of examining the divisibility of a number, one looks for a representation of the number as the difference of squares. The dividers and from are obtained from a representation . ${\ displaystyle x ^ {2} -n = y ^ {2}}$ ${\ displaystyle x + y}$ ${\ displaystyle xy}$ ${\ displaystyle n}$

In Fermat's factoring method , one calculates the value for different numbers until one gets one that is a square number. First you choose the smallest number that is greater than the square root of . Then you count up by one in each step. If you use this procedure to factorize the number 1649, you get the values in the following table. ${\ displaystyle q (x) = x ^ {2} -n}$ ${\ displaystyle x}$ ${\ displaystyle q (x)}$ ${\ displaystyle x}$ ${\ displaystyle n}$ ${\ displaystyle x}$ ${\ displaystyle q (x)}$

${\ displaystyle x}$	${\ displaystyle q (x) = x ^ {2} -n}$	Prime factorization of ${\ displaystyle q (x)}$
41	32	2 ⁵
42	115	5 23
43	200	2 ³ 5 ²
	...
57	1600	40 ²

Fermat's factorization method takes 16 steps to achieve its goal and can then use the number and the square to calculate the factorization of : ${\ displaystyle x = 57}$ ${\ displaystyle q (x) = 40 ^ {2}}$ ${\ displaystyle 1649}$

{\ displaystyle 1649 = 57 ^ {2} -40 ^ {2} = (57 + 40) \ cdot (57-40) = 97 \ cdot 17}

If you multiply the function values to and from each other, you get the square . One would like to use this square for a decomposition. However, the two equations cannot be easily multiplied. Maurice Kraitchik expanded the representation of Fermat. He looked at equations in which is a multiple of : ${\ displaystyle x = 41}$ ${\ displaystyle x = 43}$ ${\ displaystyle 2 ^ {5} \ cdot 2 ^ {3} \ cdot 5 ^ {2} = 2 ^ {8} \ cdot 5 ^ {2} = (2 ^ {4} \ cdot 5) ^ {2} }$ ${\ displaystyle x ^ {2} -y ^ {2}}$ ${\ displaystyle n}$

{\ displaystyle n | x ^ {2} -y ^ {2} \ Leftrightarrow n | (x + y) (xy)}

If , but neither nor divides, then (for the greatest common factor ) and is a nontrivial factor of . Instead of the equation , consider the following congruences : ${\ displaystyle n | (x + y) (xy)}$ ${\ displaystyle n}$ ${\ displaystyle x + y}$ ${\ displaystyle xy}$ ${\ displaystyle \ operatorname {ggT} (x + y, n)> 1}$ ${\ displaystyle \ operatorname {ggT} (x + y, n)}$ ${\ displaystyle n}$ ${\ displaystyle q (x) = x ^ {2} -n}$

{\ displaystyle n | x ^ {2} -q (x) \ Leftrightarrow x ^ {2} \ equiv q (x) {\ pmod {n}}}

These congruences can now be multiplied. If you have the congruence too ${\ displaystyle x = 41}$

{\ displaystyle 41 ^ {2} \ equiv 2 ^ {5} {\ pmod {1649}}}

and ${\ displaystyle x = 43}$

{\ displaystyle 43 ^ {2} \ equiv 2 ^ {3} \ cdot 5 ^ {2} {\ pmod {1649}}}

multiplied, the following congruence is obtained

{\ displaystyle 41 ^ {2} \ times 43 ^ {2} \ equiv 2 ^ {5} \ times 2 ^ {3} \ times 5 ^ {2} {\ pmod {1649}}}

.

The following quadratic congruence has been found:

{\ displaystyle (41 \ cdot 43) ^ {2} \ equiv (2 ^ {4} \ cdot 5) ^ {2} {\ pmod {1649}}}

.

With and you have broken down into its factors. Not every quadratic congruence gives real factors, but on average every other quadratic congruence gives real factorization. One has to consider much less function values of to get a square and thus a decomposition. How can you efficiently determine which function values multiply to a square? If one knows the prime factorization of numbers , the multiplication of these numbers becomes the addition of the exponents of their prime factorization. Therefore, one only considers even numbers whose prime factorization consists of previously determined factors. A congruence is a square if and only if the exponents of the prime factorization are even. Under these restrictions, the congruences that multiply to a square can be determined using methods from linear algebra . ${\ displaystyle \ operatorname {ggT} (41 \ cdot 43-80,1649) = 17}$ ${\ displaystyle \ operatorname {gcd} (41 \ cdot 43 + 80.1649) = 97}$ ${\ displaystyle 1649 = 17 \ cdot 97}$ ${\ displaystyle q (x)}$ ${\ displaystyle q (x)}$ ${\ displaystyle q (x)}$

In general, one looks for congruences in a first phase. If you have found a sufficient number, a subset of congruences is sought which, when multiplied together, result in a square on both sides.

Find the prime factorization of . One only looks at numbers that consist of small factors. Be the greatest prime factor . Since for and has no solution, these numbers never appear as a divisor of . The so-called factor base, in which all possible factors of the prime factorization occur, consists of the prime numbers . The matrix of the exponents of the prime factorization looks like this: ${\ displaystyle n = 87463}$ ${\ displaystyle x ^ {2} -n}$ ${\ displaystyle 29}$ ${\ displaystyle x ^ {2} = n \ mod p}$ ${\ displaystyle p = 5,7,11}$ ${\ displaystyle 23}$ ${\ displaystyle x ^ {2} -n}$ ${\ displaystyle 2,3,13,17,19,29}$ ${\ displaystyle \ mod {2}}$

x	q ( x ) = x ² - n	Prime factorization							Prime factorization mod 2
		-1	2	3	13	17th	19th	29	-1	2	3	13	17th	19th	29
265	-1 · 2 · 3 · 13 ² · 17	1	1	1	2	1	0	0	1	1	1	0	1	0	0
278	-1 · ³ · 13 · 29	1	0	3	1	0	0	1	1	0	1	1	0	0	1
296	3 ² 17	0	0	2	0	1	0	0	0	0	0	0	1	0	0
299	2 3 17 19	0	1	1	0	1	1	0	0	1	1	0	1	1	0
307	2 3 ² 13 29	0	1	2	1	0	0	1	0	1	0	1	0	0	1
316	3 ⁶ 17	0	0	6th	0	1	0	0	0	0	0	0	1	0	0

We are looking for a linear combination of the lines that result in the zero vector . A solution consists of line 3 and line 6.

{\ displaystyle x = 296 \ cdot 316 = 93536 \ equiv 6073 {\ pmod {n}}}

{\ displaystyle y = {\ sqrt {3 ^ {2} \ cdot 17 \ times 3 ^ {6} \ times 17}} = 3 ^ {4} \ times 17 = 1377 \ equiv 1377 {\ pmod {n}} }

${\ displaystyle \ operatorname {ggT} (xy, n) = 587}$ , . This gives the factorization . ${\ displaystyle \ operatorname {ggT} (x + y, n) = 149}$ ${\ displaystyle 87463 = 587 \ cdot 149}$

This basic idea is also used in Dixon's sieve, which uses random values for x . In the square sieve one considers successive values x , from which one can quickly determine the prime factorization. Finding such useful congruences is called seven. The algorithm can be divided into two steps:

the sieving step in which congruences of the shape are sought, and ${\ displaystyle x ^ {2} = q {\ pmod {n}}}$
the selection step in which suitable ones are selected from these congruences, from which a quadratic congruence results by multiplication.

Sieve step

In the sieving step, there are congruences of shape

{\ displaystyle x ^ {2} \ equiv q \ mod n}

searched, whereby the prime factorization of q is known and consists only of small prime numbers (in other words: q should be smooth with respect to a fixed bound ). One chooses the numbers x near the root of n , so the values are small. The probability of finding even numbers q ( x ) is therefore high. However, the prime factorization of a number is generally not computable in polynomial time . In order to efficiently check whether a number is smooth, the following property is used: ${\ displaystyle q (x): = x ^ {2} -n}$

{\ displaystyle {\ begin {aligned} q (x) & = x ^ {2} -n \\ q (x + kp) & = (x + kp) ^ {2} -n \\ & = x ^ { 2} + 2xkp + (kp) ^ {2} -n \\ & = q (x) + 2xkp + (kp) ^ {2} \\ & \ equiv q (x) {\ pmod {p}} \ end {aligned }}}

So if one has found places x at which q ( x ) is divisible by p , one can determine a whole sequence of which are divisible by p . p divides if and only if . The determination of the square roots modulo a prime number can be solved efficiently (e.g. using the Shanks-Tonelli algorithm ). The sequence of the numbers, also divisible by p, is determined using a sieving method similar to that of the sieve of Eratosthenes . The quadratic sieve derives its name from solving the 'quadratic' equation and the 'sieving' of the divisors. ${\ displaystyle q (x + kp)}$ ${\ displaystyle q (x) = x ^ {2} -n}$ ${\ displaystyle x ^ {2} \ equiv n {\ pmod {p}}}$

The images of the elements under the function are tested for divisibility by . The first two numbers can be used to infer the next.

{\ displaystyle n = 221}

{\ displaystyle \ {1 \ ldots 15 \}}

{\ displaystyle f (x) = x ^ {2} -n}

{\ textstyle 5}

In principle one proceeds as follows:

Step 1: Choosing a Factor Base.

Take all prime numbers p up to a bound S for which n is a quadratic remainder , i.e. H. the equation is solvable. Numbers for which n is a quadratic non-remainder can be excluded because they do not occur as a divisor of . The larger the bound, the greater the probability that there is only prime factors up to this bound. The disadvantage is that you need more relations to solve the resulting system of equations. If the bound is chosen too small, only very few numbers disintegrate as desired and we have to consider many numbers. ${\ displaystyle x ^ {2} = n {\ pmod {p}}}$ ${\ displaystyle x ^ {2} -n}$ ${\ displaystyle x ^ {2} -n}$

Therefore one chooses the bound S in the order of magnitude of

{\ displaystyle S: = {\ sqrt {\ exp \ left ({\ sqrt {\ ln n \ ln \ ln n}} \ right)}}}

Step 2: Sieve with the factors of the factor base.

Choose a sieving interval on the order of

{\ displaystyle L: = S ^ {2} = \ exp \ left ({\ sqrt {\ ln n \ ln \ ln n}} \ right)}

.

For the numbers x with | x - √ n | < L make a list with the values . Find the (two) solutions of for all factors p of the factor base. Divide all numbers within a selected sieving interval by p (as well as p ² , p ³ , ...). The numbers that end up with a 1 are smooth in terms of the factor base and thus the values sought. ${\ displaystyle q (x): = x ^ {2} -n}$ ${\ displaystyle q (t) = 0 \ mod p}$ ${\ displaystyle q (t + k \ cdot p)}$

The numbers q ( x ) to be examined are in the order of magnitude of n . Divisions on these numbers are expensive (for typical n these can no longer be saved in hardware-related formats). Since sifting is critical for runtime, step 2 is modified. One does not store the numbers q ( x ) themselves, but their logarithms rounded to whole numbers (or n as the upper bound for q ( x )). These small numbers can be handled with primitive data types. The division by a divisor p becomes a subtraction with the logarithm of p . In practice, for reasons of speed, sifting with powers of the factors is avoided.

The calculation errors (and the ignoring of multiple factors) are estimated by a bound T on the order of the logarithm of the largest factor of the factor base. The numbers from the list that are less than T after seven are very likely to be smooth and are noted in a list. Not all numbers noted in the list are necessarily smooth. In an additional step, the prime factorization of these numbers is determined and it is noted whether the number is smooth or not.

The determination of the prime factorization of the probably even numbers over the factor base can be done with a factorization method that is suitable for factors of this size. The Pollard-Rho method is suitable for small factors . Another method is to sift a second time. If you come across a probably even number, then you divide it by the factor with which you are sieving, or its powers. Since the hit rate for large factors is low, this is useful for the larger factors of the factor base. The sieving can be accelerated further if the GCF of the number to be tested is determined by the product of the factors of the factor base.

Selection step

For a ( smooth ) congruence , q only consists of factors from the factor base. q can be fully described as the vector of the exponents of its known prime factorization. For the exponent vectors of the congruences, a linear system of equations is set up over the finite field F ₂ , in which each line consists of the exponent vector of a congruence modulo 2. A number is a square if and only if all exponents of its prime factorization are even. So if one finds a nontrivial linear combination of lines that give the zero vector , one has also found a quadratic congruence. By calculating the greatest common divisor, it yields a factor of n that is neither 1 nor n in at least half of all cases . ${\ displaystyle x ^ {2} \ equiv q {\ pmod {n}}}$ ${\ displaystyle \ operatorname {ggT} (uv, n)}$

In order to solve this step process uses the linear algebra , such as the Gaussian elimination method , the conjugate gradient method or Lanczos methods . The Block Lanczos method , an extension of the Lanczos method, can solve such large - but very sparsely populated - matrices in a fraction of the sieving step, saving space (linear in the number of lines).

Application area

The square sieve is suitable for large numbers up to about 110 decimal places that are not prime powers. The number body sieve is more suitable for larger numbers .

In 1994 the number RSA-129 was factored with the multiple polynomial quadratic sieve using partial relations . This number with 129 decimal places has been broken down into its two factors (one with 64, the other with 65 decimal places). The screening step was carried out distributed by 600 volunteers. These collected congruences for 8 months, which were transmitted to the central computer via email (or ftp). The selection step on the 298 GB of data was completed in 45 hours on a supercomputer . The factor base comprised 524338 prime numbers, the matrix had a size of 569466 rows and 524338 columns.

All other factoring records were set with the number field sieve .

If you measure the running time of the algorithm with respect to the length of the input n , you can write the running time of the square sieve as follows: ${\ displaystyle N = \ log (n)}$

{\ displaystyle e ^ {c * {N ^ {\ alpha} (\ log N) ^ {1- \ alpha}}}, \ alpha = 1/2, c = 1}

There is exponential growth for. The trial division has such runtime behavior for . With would result in a polynomial algorithm with runtime . The quadratic sieve is therefore an algorithm with a superpolynomial but sub-exponential running time. With the number body sieve, the constant could be reduced to 1/3. However, c , which asymptotically influences the running time less than , is much larger there. There are improvements to the basic idea of the square sieve that further reduce the running time: ${\ displaystyle \ alpha = 1}$ ${\ displaystyle c = 1/2}$ ${\ displaystyle \ alpha = 0}$ ${\ displaystyle {\ mathcal {O}} (n ^ {c})}$ ${\ displaystyle \ alpha}$ ${\ displaystyle \ alpha}$

Partial relations

Even relations that are not smooth can be combined into (smooth) relations that are useful for the selection step. You have two partial relations whose prime factorization contains a factor P (outside the factor base)

{\ displaystyle x ^ {2} \ equiv p_ {1} ^ {k_ {1}} \ cdot p_ {2} ^ {k_ {2}} \ cdot \ cdot \ cdot p_ {s} ^ {k_ {s} } \ cdot P {\ pmod {n}}}

{\ displaystyle {\ bar {x}} ^ {2} \ equiv p_ {1} ^ {l_ {1}} \ cdot p_ {2} ^ {l_ {2}} \ cdot \ cdot \ cdot p_ {s} ^ {l_ {s}} \ cdot P {\ pmod {n}}}

so these result in a congruence

{\ displaystyle (x \ cdot {\ bar {x}}) ^ {2} \ equiv p_ {1} ^ {k_ {1} + l_ {1}} \ cdot p_ {2} ^ {k_ {2} + l_ {2}} \ cdot \ cdot \ cdot p_ {s} ^ {k_ {s} + l_ {s}} \ cdot P ^ {2} {\ pmod {n}}}

By multiplying by P ⁻² we even get the following smooth relation

{\ displaystyle (x \ cdot {\ bar {x}} \ cdot P ^ {- 1}) ^ {2} \ equiv p_ {1} ^ {k_ {1} + l_ {1}} \ cdot p_ {2 } ^ {k_ {2} + l_ {2}} \ cdot \ cdot \ cdot p_ {s} ^ {k_ {s} + l_ {s}} {\ pmod {n}}}

In the penultimate relation all factors with odd exponents come from the factor base. This relation can thus be used for the selection step. If you limit the size of the factor , you can determine it with little extra effort: you increase the limit for the interesting numbers in the sieving step by . The factor is ultimately left in determining the prime factorization by factoring the factor base. Partial relations can be used to increase the relations that can be used for the selection step. The running time can thus be halved. ${\ displaystyle P}$ ${\ displaystyle T}$ ${\ displaystyle \ log (P)}$ ${\ displaystyle P}$

Multiple polynomials

The size of the numbers generated with the square sieve increases linearly with the distance to the zero. With Multiple Polynomial Quadratic Sieve (MPQS) you define different (disjoint if possible) functions, each of which contains a fixed factor and shows the same growth. The search interval can be divided into several polynomials. This means that the numbers to be examined for divisors are smaller and the probability of generating an even number increases.

The function is modified by using a polynomial of the first degree instead of now. ${\ displaystyle q (x) = x ^ {2} -n}$ ${\ displaystyle x}$

The Multiple Polynomial Quadratic Sieve looks at a set of polynomials

{\ displaystyle q_ {a} (x) = (2ax + b) ^ {2} -n}

Here is chosen such that by is divisible . This applies ${\ displaystyle b}$ ${\ displaystyle b ^ {2} -n}$ ${\ displaystyle 4a}$ ${\ displaystyle b ^ {2} -n = 4ac}$

{\ displaystyle q_ {a} (x) = (2ax + b) ^ {2} -n = (2ax) ^ {2} + 4abx + b ^ {2} -n = 4a \ cdot (ax ^ {2} + bx + c)}

and the value thus generated contains as a factor. The choice of even factors creates an additional factor in addition to the factor in the generated numbers. ${\ displaystyle q_ {a} (x)}$ ${\ displaystyle 4a}$ ${\ displaystyle 2a}$ ${\ displaystyle 2a}$ ${\ displaystyle 2}$

With Multiple Polynomial Quadratic Sieve (MPQS) one chooses a prime number as the square so that a quadratic remainder is mod . Thus, the equation for exactly two solutions and is efficiently determined. The process was developed in 1983 by JA Davis and DB Holdridge. The sieving process itself works in a similar way to the normal square sieve, but the inverse element of must be calculated for each factor of the factor base , which takes up a large proportion of the total computing time. ${\ displaystyle a}$ ${\ displaystyle n}$ ${\ displaystyle 4a}$ ${\ displaystyle b ^ {2} = n \ mod \ a}$ ${\ displaystyle b}$ ${\ displaystyle p}$ ${\ displaystyle a \ mod \ p}$

With Self Initializing Quadratic Sieve (SIQS), the factor base is chosen as the product of factors. This means that there are more values for than with the MPQS. This reduces the computing time when changing the polynomial. This process was discovered by René Peralta as well as William Robert Alford and Carl Pomerance in 1995. ${\ displaystyle a}$ ${\ displaystyle a}$ ${\ displaystyle b}$

Pre-factors

One can apply the whole procedure to a multiple instead of to the number . Changing the number to be factored usually also changes the factor base. By cleverly choosing the prefactor, you can integrate additional factors into the factor base. For the length of the numbers to be examined without the factor from the factor base, which is crossed out by seven from the numbers, the following applies: A small factor in the factor base reduces the length of the numbers more than a large factor. By varying you can increase the number of small factors in the factor base and thus increase the probability of getting an even number. However, by multiplying by , the numbers generated increase. With the so-called Knuth-Schroeppel function, both effects are taken into account. A good prefactor can thus be determined efficiently. Integrating factor 2 into the factor base has certain additional advantages. In order to divide this factor out of the candidates for even numbers, only shifts instead of complex divisions have to be carried out. If the following applies ${\ displaystyle n}$ ${\ displaystyle kn}$ ${\ displaystyle k}$ ${\ displaystyle k}$ ${\ displaystyle kn}$

{\ displaystyle k \ cdot n = 1 \ mod 8}

then applies to all generated numbers

{\ displaystyle q (x) = (2x + 1) ^ {2} -n = 0 \ mod 8}

that is, one examines only odd numbers and all generated numbers contain a factor 8. As a multiple polynomial with one would expect a factor 4. The numbers generated in this way contain an additional factor of 2. ${\ displaystyle a = 1}$

Implementations

msieve , an implementation of the Multiple Polynomial Quadratic Sieve with support for partial relations. Written by Jason Papadopoulos.
YAFU , by Ben Buhrow, is arguably the fastest implementation of the Self Initializing Quadratic Sieve.
Factoring applet by Dario Alpern. A JavaScript implementation of the SIQS.
Tilman Neumann's open source java-math-library contains PSIQS, probably the fastest square sieve written in Java.

literature

Carl Pomerance: A Tale of Two Sieves. Notices of the AMS, 43 (1996) pp. 1473-1485. (Web version: http://www.ams.org/notices/199612/pomerance.pdf )
Richard Crandall, Carl Pomerance: Prime Numbers, A Computational Perspective. Springer, 2001, ISBN 0-387-94777-9 .
Arjen K. Lenstra, Mark S. Manasse: Factoring With Two Large Primes. EUROCRYPT 1990, pp. 72-82.
James A. Davis, Diane B. Holdridge: Factorization Using the Quadratic Sieve Algorithm. CRYPTO 1983, pp. 103-113.
Joseph Gerver : Factoring Large Numbers with a Quadratic Sieve. Math. Comp. 41 (1983), No. 163, pp. 287-294.

Web links

schule.de (PDF) - very good introduction (German)
https://pdfs.semanticscholar.org (PDF) - Detailed description of the (Self Initializing) Quadratic Sieves by Scott Patrick Contini (English).
http://www.karlin.mff.cuni.cz (PDF) - Description of the (Self Initializing) Quadratic Sieves by Marian Kechlibar (English). (PDF; 392 kB)
alpertron.com.ar - Java applet for factoring numbers (also uses the Self Initializing Quadratic Sieve)
http://www.math.uiuc.edu/~landquis/quadsieve.pdf ( Memento from December 3, 2008 in the Internet Archive ) (PDF; 123 kB) Overview of the square sieve by Eric Landquist (English).
math.colostate.edu (PDF; 50 kB) Demonstration of the square sieve using the number 87463 as an example (English).
cdc.informatik.tu-darmstadt.de - The block Lanczos algorithm over GF (2) by Olaf Gross.

Individual evidence

^ Carl Pomerance: A Tale of Two Sieves . In: Notices of the AMS . tape 43 , no. 12 , 1996, pp. 1473–1485 ( ams.org [PDF]). P. 1478
^ The block Lanczos Algorithmus over GF (2) by Olaf Gross http://www.cdc.informatik.tu-darmstadt.de/reports/reports/gross.diplom.ps.gz
^ Arjen K. Lenstra, Mark S. Manasse: Factoring With Two Large Primes. EUROCRYPT 1990: 72-82

[1] Carl Pomerance: A Tale of Two Sieves . In: Notices of the AMS . tape 43 , no. 12 , 1996, pp. 1473–1485 ( ams.org [PDF]). P. 1478

[2] The block Lanczos Algorithmus over GF (2) by Olaf Gross http://www.cdc.informatik.tu-darmstadt.de/reports/reports/gross.diplom.ps.gz

[3] Arjen K. Lenstra, Mark S. Manasse: Factoring With Two Large Primes. EUROCRYPT 1990: 72-82