Collision resistance

A function (almost always a one-way function in this context ) is said to be collision-resistant when it is “difficult” to find different inputs that map to the same value. In the case of cryptographic hash functions in particular , this is a common requirement, the fraction of which is generally regarded as a fraction of the complete hash function.

background

Cryptographic hash functions are an important primitive in numerous practical applications, especially in connection with digital signatures in the context of the “hash-then-sign” paradigm. Here it is obviously desirable that no one is able to find a valid signature for a message and another message for which the signature would also be valid; Since the message is usually hashed before it is signed, the hash function must ensure that it is not possible to find another message for a given message that leads to the same hash. This property is called resistance to finding further archetypes or weak collision resistance . ${\ textstyle m_ {0}}$ ${\ textstyle m_ {1}}$ ${\ textstyle m_ {0}}$ ${\ textstyle m_ {1}}$

Somewhat less obvious is the stronger requirement for (strong) collision resistance, which prevents any collisions from being found: in this case, in order to forge a signature, it is generally no longer sufficient to find an existing one; instead, the owner of the secret key must be persuaded to create a Sign the message chosen by the attacker. At first glance, this may seem implausible, especially since most of the collisions that can be found in practice initially seem to have no meaningful content. By using different properties of common file formats (e.g. PDF ) and typical constructions of most hash functions, however, two almost freely selectable documents can be specifically created that only appear suspicious when examined by experts. A conceivable scenario in this case would be a politician who signs a specially prepared document with supposedly harmless content (digitally), whereby the attacker receives a signature on another document with (on the surface) almost any other content.

notation

In the following, a family of one-way functions denotes a single one-way function, its archetype space, the probability that an event occurs, and a potentially propabilistic algorithm with polynomial runtime in the security parameter and a probability that is negligible in the security parameter . ${\ textstyle \ mathbb {H}}$ ${\ textstyle H}$ ${\ textstyle \ mathbb {M}}$ ${\ textstyle \ mathrm {Pr} [X]}$ ${\ textstyle X}$ ${\ textstyle {\ mathcal {A}}}$ ${\ textstyle \ mathrm {negl.}}$

Weak collision resistance

A one-way function is said to be weak (in the sense of “easier to achieve”) collision-resistant if no attacker is able to find a second for a given archetype that is mapped to the same value. The English term “second-preimage-resistance” (for example “resistance to second archetypes”) is also widespread here.

In more formal terms, this means that there is no attacker running in propabilistic polynomial time ( ) who has a more than negligible chance of finding another archetype for a randomly selected archetype , so that the following are mapped to the same value: ${\ textstyle PPT}$ ${\ textstyle \ mathbb {M}}$ ${\ textstyle m}$ ${\ textstyle m '\ in \ mathbb {M}}$ ${\ textstyle m \ neq m '}$

{\ displaystyle \ forall {\ mathcal {A}} \ in PPT: \ mathrm {Pr} \ left [m \ leftarrow \ mathbb {M}; m ': = {\ mathcal {A}} (m): \ H (m) = H (m ') \ land m \ neq m' \ right] <\ mathrm {negl.}}

Practically relevant attacks against this property with common hash functions are comparatively rare.

Strong collision resistance

As a rule, strong collision resistance is clearly understood to mean that it is practically impossible to find two different archetypes and , so that . ${\ textstyle m_ {0}}$ ${\ textstyle m_ {1}}$ ${\ textstyle H (m_ {0}) = H (m_ {1})}$

Strictly speaking, this is an over-simplification that cannot be fulfilled: for every non- injective function there is at least one value pair with and . There is thus also an attacker who reports this collision. This would mean that with probability 1 (i.e. always) a collision would be found in polynomial time (namely the time it takes to write down the collision). While it may be practically impossible to “find” this attacker (since a collision would have to be found first), there is no known way how this objection could be formalized. ${\ textstyle m_ {0}, m_ {1}}$ ${\ textstyle H (m_ {0}) = H (m_ {1})}$ ${\ textstyle m_ {0} \ neq m_ {1}}$

In order to avoid this problem, in contexts in which strict formalization is necessary, families of one-way functions and functions randomly drawn from them are discussed: is then a family of collision-resistant one-way functions if it is true that there is no efficient attacker who is responsible for a randomly extracted function with more than negligible probability can output two different archetypes and , so that : ${\ textstyle \ mathbb {H}}$ ${\ textstyle \ mathbb {H}}$ ${\ textstyle H}$ ${\ textstyle m_ {0}}$ ${\ textstyle m_ {1}}$ ${\ textstyle H (m_ {0}) = H (m_ {1})}$

{\ displaystyle \ forall {\ mathcal {A}} \ in PPT: \ mathrm {Pr} \ left [H \ leftarrow \ mathbb {H}; m_ {0}, m_ {1}: = {\ mathcal {A} } (H): \ H (m_ {0}) = H (m_ {1}) \ land m_ {0} \ neq m_ {1} \ right] <\ mathrm {negl.}}

Due to the birthday paradox , it is usually much easier to find any collisions than second archetypes, which is why the output length of most hash functions corresponds to twice the length of the desired security level: Should a hash function offer around 128 bits of security against collisions (this is how a collision is found by Brute Force need about arithmetic operations), it needs an output of 256 bits because . This is in direct contrast to the weak collision resistance, for which 128 bits of output would be sufficient. As a result, most of the classic hash functions offer (intentionally) twice as much security against finding first or second archetypes than against finding any collisions. ${\ textstyle 2 ^ {128}}$ ${\ textstyle {\ sqrt {2 ^ {256}}} = 2 ^ {128}}$

However, with the development of SHA-3 and the associated “sponge” construction, this has changed so that it is now possible to reduce the resistance to finding first and second archetypes in a defined way so that it corresponds to that of the strong collision resistance which enables higher performance. However, this does not change anything in the required double length of the output, but only reduces the security elsewhere.

In contrast to the relatively unspectacular security history of archetype resistances, the collision resistance of many established and practically used hash functions such as MD5 or SHA-1 was practically broken. Since these fractions were partially attributed to the Merkle-Damgård construction, which was mostly used in those functions and which was also the basis of SHA-2 , the NIST launched the SHA3 competition, the aim of which was to develop a new hash function with an ideally different structure was in order to have a ready-made alternative in the event of a breakage of SHA2 (which has not yet occurred and is now considered to be rather unlikely).

Perfect collision resistance

If injective , there are no collisions, which means that from an information theory point of view, it is collision-resistant. ${\ textstyle H}$ ${\ textstyle H}$

The big disadvantage here is that injectivity is in direct contradiction to the usual requirements for general hash functions: In order to be injective, the output must on average be at least as long as the input, which in the case of functions, the character strings of any length to a fixed length map, obviously cannot be the case. Furthermore, perfect collision resistance does not imply that it is difficult to find archetypes, which is an extremely important property in many applications of hash functions. ${\ textstyle H}$

The simplest example of such a function, which at the same time reveals the limitations of this concept, is identity, i.e. the function that returns its argument unchanged: since every image is exactly its own archetype, there cannot be a second archetype for any image; at the same time it is trivial to find the corresponding archetype for an image.

Another example, which, depending on the approach, either offers perfect collision resistance or is trivially insecure, is the exponentiation of elements in finite, cyclic groups of primary order . If a fixed generator is exponentiated with the archetype in a fixed group, this operation is free of collisions, if the exponent is regarded as an element of the (finite) remainder class group , which is only represented by the integer used in the actual calculation. Since in this case applies to everyone in a certain way , strictly speaking it is not a collision. On the other hand, if the exponent is interpreted as a regular integer, then it goes without saying that what represents a trivial collision. ${\ textstyle p}$ ${\ textstyle \ mathbb {Z} _ {p}}$ ${\ textstyle x = x + p}$ ${\ textstyle x}$ ${\ textstyle x, x + p}$ ${\ textstyle x \ neq x + p}$ ${\ textstyle x, x + p}$

Relationships to other properties of hash functions

As already shown in the previous section, collision resistance does not necessarily imply that a function is a one-way function. Against this background, it should come as no surprise that the images do not necessarily look pseudo-random either: if a function is collision-resistant, then the function that the character string (where the concatenation denotes) is also collision-resistant for an input , but always ends with four zeros and therefore works no longer necessarily pseudo-random depending on the comparison quantity. ${\ textstyle H}$ ${\ textstyle H '}$ ${\ textstyle m}$ ${\ textstyle H (m) || 0000}$ ${\ textstyle ||}$

literature

Phillip Rogaway and Thomas Shrimpton: Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance . 2004, p. 371–388 , doi : 10.1007 / 978-3-540-25937-4_24 (English).