Multiplicative method

from Wikipedia, the free encyclopedia

The multiplicative method is a scheme according to which hash functions can be developed. The product of the key is formed with a number and the integer part is cut off so that the key is mapped into the interval . The result is multiplied by the number of hash table addresses and rounded down. In formula notation, a hash function that implements the multiplicative method is defined as:

algorithm

With this method, the choice of , which is independent of the choice of, is decisive for the distribution of the keys to the address range of the hash table . In the literature, the golden ratio is suggested for a good choice of , with which in practice with a lot of typical input data the distribution of the hash values ​​of the keys is almost evenly distributed . If this number is used, the resulting hash function is also called the Fibonacci hash .

The multiplicative method can be seen as a generalization of the residual division method . If you put it so is

In the theory of hashing, by distributing the keys equally to the addresses in the hash table, an attempt is usually made to distribute the keys evenly across the table and thereby minimize collisions.

The equal distribution is a random experiment with the probability when there is an elementary event and the number of elementary events , e.g. B. the coin toss or the dice .

The golden ratio is an irrational number ; H. it is off , and it achieves a good distribution of the keys over the address range of the hash table through the following property of irrational numbers:

Be an irrational number. If you place the points in the interval , then the interval parts have at most three different lengths. In addition, the next point,, falls in one of the largest parts of the interval.

Every periodic decimal fraction or dual fraction represents a rational number. Natural or rational numbers with a finite number of digits, e.g. B. are 0-periodic, i.e. H. the remaining infinite digits are set equal to 0.

Every irrational number can only be represented by a non-periodic infinite decimal fraction or dual fraction. If it is off , it can be represented as the limit value of a - -dic fraction.

The equal distribution and the golden ratio can therefore only be calculated approximately using algorithms.

rating

The equal distribution can only be achieved insufficiently, and as a result the keys, especially of well-sorted sets (e.g. 1, 2, 3, ...), are distributed unevenly over the hash table, which can result in long chains and empty address spaces and thus slower access to the data.

The golden ratio can be calculated precisely to any number of places and thus approximated well, but has the problem that evenly distributed quantities are not optimally distributed over the hash table, since, according to Vera Turan Sós' theorem, only well-sorted quantities are optimally distributed between .

Since one generally finds neither uniformly distributed nor sorted quantities in practice, the golden ratio is a good choice for the multiplicative method.

Lum, Yuen and Dodd compared the behavior of different hash functions and found that the residual division method does not produce any worse results than other hash functions.

implementation

For example, be with . Then is .

Then the associated multiplicative hash function is

where with denotes the hash table size and with denotes a key value.

A general implementation in the C programming language :

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <inttypes.h>

static const double s = 2654435769.0,  w = 4294967296.0;

uint32_t mult_hash(uint32_t k, uint32_t m)
{
  return (uint32_t) floor( ( (double) m * fmod( s * (double) k, w) ) / w );
}

int main(int argc, char **argv)
{
  uint32_t l = mult_hash(atoi(argv[1]), 1024);
  printf("%" PRIu32 "\n", l);
  return 0;
}

If you choose a power of two, the algorithm can be implemented much more efficiently:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <inttypes.h>

static const uint32_t s = 2654435769;

uint32_t mult_hash(uint32_t kk, uint32_t i)
{
  uint64_t k = kk;
  uint64_t t = s * k & 0x00000000FFFFFFFF;
  return t >> (32 - i);
}

int main(int argc, char **argv)
{
  uint32_t l = mult_hash(atoi(argv[1]), 10);
  printf("%" PRIu32 "\n", l);
  return 0;
}

literature

Individual evidence

  1. ^ Donald E. Knuth: The Art of Computer Programming / Sorting and Searching . Volume 3. Addison-Wesley, Massachusetts, 2nd edition, 1997; Page 516
  2. Thomas Ottmann, Peter Widmayer: Algorithms and Data Structures . second edition. BI Wissenschaftsverlag, Mannheim / Leipzig / Vienna / Zurich 1993
  3. a b Vera Turan Sós (Acta Math. Acad. Sci. Hung. 8 (1957), 461–471)
  4. VY Lum, PST Yuen, M. Dodd: Key-to-address transform techniques: Performance study on large existing formatted files . In: Communications of the ACM , 14 (4), April 1971, pp. 228-239.