Pollard-Rho method

Graphic representation of the partial results

The Pollard-Rho methods are algorithms for determining the period length of a number sequence that is calculated using a mathematical function. Various difficult math problems such as the discrete logarithm and factorization can be calculated using these methods. An optimized variant of the Pollard-Rho method was developed by John M. Pollard in 1975 for prime factorization . Such methods can also be used to calculate collisions in hash functions.

With the Pollard-Rho methods, sequences of partial results are calculated. At a certain point, some of these partial results are only repeated. The partial results can be arranged graphically so that the shape of the letter ρ (Rho) can be recognized. The name of the methods is derived from this.

functionality

Find a prime factor of the number . In general, however, this divisor does not necessarily have to be a prime number. The method is based on the generation of a sequence of pseudo-random numbers . Any function can be used to generate the random sequence . It is only necessary that it also follows from, and this already applies, for example, when is given by a polynomial with integer coefficients. ${\ displaystyle p}$ ${\ displaystyle n}$ ${\ displaystyle f \ colon \ mathbb {N} \ to \ mathbb {N}}$ ${\ displaystyle x \ equiv y {\ pmod {p}}}$ ${\ displaystyle f (x) \ equiv f (y) {\ pmod {p}}}$ ${\ displaystyle f}$

The sequence starts with a largely freely selectable start value . The other values are calculated iteratively according to ${\ displaystyle x_ {0}}$

{\ displaystyle x_ {i} = f (x_ {i-1})}

The modulo function values can at most assume the various values . If one of these values occurs again, these values are then repeated modulo . This happens at the latest after iterations and on average after about iterations. For the same reasons, one can expect after about iterations that the values will repeat modulo . If it is already known that has a small prime factor, is considerably smaller than , so that it can be hoped that the repetition modulo starts considerably earlier than the repetition modulo . ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle 0,1,2, \ ldots, p-1}$ ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle {\ sqrt {p}}}$ ${\ displaystyle {\ sqrt {n}}}$ ${\ displaystyle n}$ ${\ displaystyle n}$ ${\ displaystyle {\ sqrt {p}}}$ ${\ displaystyle {\ sqrt {n}}}$ ${\ displaystyle p}$ ${\ displaystyle n}$

In the case of a sequence of numbers calculated in this way with a finite number of possible function values, a few values are initially used in a previous period

{\ displaystyle x_ {0}, x_ {1}, \ ldots, x_ {k-1}}

accepted. As soon as a value occurs repeatedly, the values are then repeated cyclically

{\ displaystyle x_ {k}, x_ {k + 1}, \ ldots, \ left (x_ {k + l} = x_ {k} \ right), x_ {k + 1}, \ ldots}

This behavior of the sequence gave the method its name, since the period can be imagined like a circle and the elements of the sequence at the beginning like a stem that leads into the circle. Graphically, it looks like the Greek letter ρ .

If two values and modulo from the sequence have the same value, for which consequently applies, the greatest common factor results in a multiple of and often a real factor of . ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle p}$ ${\ displaystyle x \ equiv y {\ pmod {p}} \}$ ${\ displaystyle \ operatorname {ggT} (xy, n)}$ ${\ displaystyle p}$ ${\ displaystyle n}$

However, it is very time-consuming to compare all numerical values in this way. An optimized variant of the Pollard-Rho method therefore calculates two sequences to determine the period length. One episode

{\ displaystyle x = (x_ {1}, x_ {2}, x_ {3}, x_ {4}, \ ldots)}

and the second episode

{\ displaystyle y = (y_ {1}, y_ {2}, y_ {3}, y_ {4}, \ ldots) = (x_ {2}, x_ {4}, x_ {6}, x_ {8} , \ ldots)}

This trick can avoid comparing a large number of function values. The greatest common factor does not now have to be calculated for all pairs . It is sufficient to calculate or respectively . ${\ displaystyle (x_ {i}, x_ {j})}$ ${\ displaystyle \ operatorname {ggT} (x_ {i} -x_ {j}, n)}$ ${\ displaystyle \ operatorname {ggT} (x_ {i} -y_ {i}, n)}$ ${\ displaystyle \ operatorname {ggT} (x_ {i} -x_ {2i}, n)}$

Since , as a sought divisor of , is unknown, the remainder of the division by cannot initially be calculated. The equality of two values and is therefore not queried, but is calculated. If the values and only differ by a multiple of , the value of is a multiple of the divisor of . Integer multiples of are also integral multiples of and therefore do not need to be taken into account in the calculation. As a result, it is sufficient to calculate the function values modulo . ${\ displaystyle p}$ ${\ displaystyle n}$ ${\ displaystyle p}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle \ operatorname {ggT} (xy, n)}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle p}$ ${\ displaystyle \ operatorname {ggT} (xy, n)}$ ${\ displaystyle p}$ ${\ displaystyle n}$ ${\ displaystyle n}$ ${\ displaystyle p}$ ${\ displaystyle n}$

A function of the form can be used to calculate the sequence of numbers . With this choice, only a part, about half, of the values can occur up to the remainder, whereby the earlier occurrence of the sought repetitions is somewhat favored. ${\ displaystyle f (x) = x ^ {2} + const}$ ${\ displaystyle 0}$ ${\ displaystyle p-1}$

Formal definition

Let be the number from which a prime factor is to be calculated. Denote a sequence of pseudorandom numbers such as ${\ displaystyle n}$ ${\ displaystyle p}$ ${\ displaystyle (x_ {k}) _ {k \ in \ mathbb {N}}}$

{\ displaystyle {\ begin {aligned} x_ {0} & = 2 \\ x_ {k + 1} & = x_ {k} ^ {2} + c {\ pmod {n}} \ quad {\ text {with }} \ quad c \ not \ equiv 0 {\ pmod {n}} {\ text {and}} c \ not \ equiv -2 {\ pmod {n}}. \ end {aligned}}}

If there is a real prime factor , then the following applies ${\ displaystyle p}$

There is an index so that and with .

{\ displaystyle i <p}

{\ displaystyle n> k_ {i}> 1}

{\ displaystyle k_ {i} \, | \, n}

{\ displaystyle k_ {i} = \ operatorname {ggT} (| x_ {i} -x_ {2i} |, n)}

algorithm

Input : is the number to be factored and be the pseudo-random function modulo Output : a non-trivial factor of or an error message ${\ displaystyle n}$ ${\ displaystyle f (x)}$ ${\ displaystyle n}$
${\ displaystyle n}$

x ← 2, y ← x; d ← 1
As long as d = 1:
1. x ← f ( x )
2. y ← f ( f ( y ))
3. d ← ggT (| x - y |, n )
If 1 < d < n then return d .
If d = n , then output "error".

Note: This algorithm returns an error message for everyone who is only divisible by 1 and themselves. However, an error message can also be returned for the others . In this case, choose another function and try again. ${\ displaystyle n}$ ${\ displaystyle n}$ ${\ displaystyle f (x)}$

If the result is a number, then this is really also a divisor and therefore a correct result, although this generally does not necessarily have to be a prime number.

For one chooses a polynomial with an integer coefficient. A common function for this algorithm is as follows: ${\ displaystyle f}$ ${\ displaystyle f (x)}$

{\ displaystyle f (x) = x ^ {2} + c {\ hbox {mod}} n, \, c \ neq 0, -2.}

Estimation of the running time

The number sequences and can be viewed as pseudo-random sequences. If a numerical value occurs again, the following values are inevitably repeated. Up to values can be assumed (with a quadratic as above: up to values). The expected value for the length of a cycle is . The fact that far fewer than calculations are required is sometimes called the birthday paradox . ${\ displaystyle x_ {i} \ mod \ p}$ ${\ displaystyle x_ {2i} \ mod \ p}$ ${\ displaystyle p}$ ${\ displaystyle f}$ ${\ displaystyle {\ tfrac {p + 1} {2}}}$ ${\ displaystyle {\ sqrt {p}}}$ ${\ displaystyle p}$

The worst case occurs when a product of two prime numbers is the same length. The algorithm then terminates after O (n ^1/4polylog (n)) steps with a probability of . The method works well for factoring numbers with several smaller factors. The algorithm can factor a number with twice as many digits as the trial division in the same time (with high probability). The algorithm works exponentially in the length of the input and is therefore asymptotically slower than the square sieve and the number field sieve . ${\ displaystyle n}$ ${\ displaystyle {\ tfrac {1} {2}}}$

Numerical example

1st example

We are looking for the factors of number . We use the function and the starting value : ${\ displaystyle n = 703}$ ${\ displaystyle f (x) = x ^ {2} +23 \ mod n}$ ${\ displaystyle x_ {0} = 431}$

Table: Rho method for n = 703
n = 703, f ( x ) = x ² + c with c = 23, x ₀ = 431
i	x _i = f ( x _{i -1} )	y _i = x _{2 i} = f ( f ( y _{i -1} ))	d = gcd (\| x - y \|, n )
1	192	331	1
2	331	49	1
3	619	125	19th
4th	49	106	19th
5	315	144	19th
6th	125	619	19th
7th	182	315	19th
8th	106	182	19th
9	11	11	703
10	144	372	19th
11	372	49	19th
12	619	125	19th

With that the prime factorization of is found. ${\ displaystyle 703 = 19 \ cdot 37}$

2nd example

Table: Rho method for n = 2717
n = 2717, f ( x ) = x ² + c mod n with c = 4, x ₀ = 2
i	x _i = f ( x _{i -1} )	y _i = x _{2 i} = f ( f ( y _{i -1} ))	d = gcd (\| x - y \|, n )
1	8th	68	1
2	68	277	209
3	1911	2367	19th
4th	277	68	209
5	657	277	19th
6th	2367	2367	2717
7th	239	68	19th
8th	68	277	209

This example shows that the factor found does not necessarily have to be a prime number. The factor found here is . ${\ displaystyle 209 = 11 \ cdot 19}$

Factoring

Using the method described, in 1980 the Fermat number

{\ displaystyle F_ {8} = 2 ^ {2 ^ {8}} + 1 = 2 ^ {256} + 1 = 1238926361552897 \ cdot p_ {62}}

be factored. denotes a (prime) number with 62 digits, which was only later proven to be a prime number. ${\ displaystyle p_ {62}}$

Implementations

The Rho method is rho_factorize()part of the function library of the ARIBAS program by Otto Forster .

literature

A Monte Carlo Method for Factorization , JMPollard, BIT 15 (1975) 331-334
An Improved Monte Carlo Factorization Algorithm , RPBrent, BIT 20 (1980) 176-184
Otto Forster: Algorithmic Number Theory. Vieweg, 1996, ISBN 3-528-06580-X