Hamming weight

from Wikipedia, the free encyclopedia

The Hamming weight of a character string is defined as the number of characters other than the null character of the alphabet used . It is equivalent to the Hamming distance to the zero vector (a character string of the same length that only consists of zero characters). For the classic case of a binary string, this is the number of bits set in this string and is also called population count or popcount and can be understood as the sum of the digits of the binary representation of a given number or as the ℓ₁ norm of a bit vector.

history

The Hamming weight is named after Richard Hamming , although he did not coined the term. In the context of binary numbers, it was used by JWL Glaisher as early as 1899 to give a formula for the number of odd binomial coefficients in a single series of Pascal's triangle . And Irving S. Reed introduced a concept in 1954 that is similar to the Hamming weight for binary numbers.

Application examples

The Hamming weight is used in information theory , coding theory and cryptography , among others . Examples of hamming weight applications are:

  • In binary modulo exponentiation , the number of modular multiplications required for an exponent e is log2 e + weight (e). This is why the public key value e used in RSA is typically chosen to be a number with a low Hamming weight.
  • The Hamming weight determines the path lengths between the nodes in hash tables distributed in chords .
  • The search in biometric databases for iris recognition is typically implemented in such a way that the Hamming distance is calculated for each stored data record.
  • In computer chess programs that use a bitboard representation, the hamming weight of a bitboard represents the number of pieces of a given type that remain in the game, or the number of squares on the board that are controlled by a player's pieces and is an important factor in calculating the value of a position.
  • The Hamming weight can be used to identity by means of the ffs (x) = pop (x ^ (~ (-x))) to find the first bit set in an efficient manner ( findFirst set ). This is useful on platforms like SPARC which have hardware instructions for the hamming weight but no hardware instructions to find the first bit set.
  • The calculation of the Hamming weight can be interpreted as the conversion from the unary system to binary numbers .

Efficient implementation

The Hamming weight of a bit string is often required in cryptography and other applications. The Hamming distance of two words A and B can be calculated as the Hamming weight of A xor B.

The problem of how to efficiently implement the calculation of the Hamming weight has been thoroughly investigated. Some processors have a single instruction to compute it and others have parallel operations on bit vectors. For processors that lack these characteristics, the best solutions are based on adding counts in a tree pattern. For example, to calculate the number of 1's bits in this 16-bitting binary number a = 0110 1100 1011 1010, the following calculations can be performed:

Expression Binary Decimal comment
a 01 10 11 00 10 11 10 10 the original number a
b0 = (a >> 0) & 01 01 01 01 01 01 01 01 01 00 01 00 00 01 00 00 1,0,1,0,0,1,0,0 every other bit from a
b1 = (a >> 1) & 01 01 01 01 01 01 01 01 00 01 01 00 01 01 01 01 0,1,1,0,1,1,1,1 the remaining bits from a
c = b0 + b1 01 01 10 00 01 10 01 01 1,1,2,0,1,2,1,1 List with the number of 1 in each 2-bit segment from a
d0 = (c >> 0) & 0011 0011 0011 0011 0001 0000 0010 0001 1,0,2,1 every other number from c
d2 = (c >> 2) & 0011 0011 0011 0011 0001 0010 0001 0001 1,2,1,1 the remaining numbers from c
e = d0 + d2 0010 0010 0011 0010 2.2.3.2 List with the number of 1 in each 4-bit segment from a
f0 = (e >> 0) & 00001111 00001111 00000010 00000010 2.2 every other number from e
f4 = (e >> 4) & 00001111 00001111 00000010 00000011 2.3 the remaining numbers from e
g = f0 + f4 00000100 00000101 4.5 List with the number of 1 in each 8-bit segment from a
h0 = (g >> 0) & 0000000011111111 0000000000000101 5 every other number from g
h8 = (g >> 8) & 0000000011111111 0000000000000100 4th the remaining numbers from g
i = h0 + h8 0000000000001001 9 the result for the 16-bit word

Here the operations were used as in the C programming language, X >> Y means to shift X by Y bit to the right, X & Y means the bitwise AND of X and Y and + is the usual addition.

Language support

  • Some C compilers provide intrinsic functions that offer bit counting functions. For example, the GCC (as of version 3.4 April 2004) contains a built-in function __builtin_popcountthat uses a processor instruction, if available, or an efficient library implementation otherwise. The LLVM-GCC offers this function from version 1.5 in June 2005.
  • In the C ++ Standard Template Library ( C ++ STL ), the bit array data structure has bitseta count()method that counts the number of bits set.
  • In Java, the bit array data structure has BitSeta BitSet.cardinality()method that counts the number of bits set. In addition, there are the functions Integer.bitCount(int)and Long.bitCount(long)to count bits in 32-bit and 64-bit primitive integers. The BigIntegerhigh precision class also has a BigInteger.bitCount()method that counts bits.
  • In Common Lisp , the logcount function returns the number of 1 bits for a non-negative integer, and the number of 0 bits in 2's complement notation for negative integers. In both cases the integer can be a BIGNUM.
  • Starting with GHC 7.4, the Haskell basic package has a popCountfunction that is available on all types that are instances of the Bitsclass (available from the Data.Bitsmodule).
  • The MySQL version of the SQL language offers BIT_COUNT()as a standard feature .

Processor support

  • Cray supercomputers early offered a popcount machine instruction , and there are rumors that this was specifically requested by the US government's National Security Agency for cryptanalysis applications .
  • Some of the Control Data Corporation CYBER 70/170 series machines included a popcount instruction; in COMPASS this instruction was coded as CXi.
  • AMD's Barcelona architecture introduced the abm (advanced bit manipulation) instruction set and brought the POPCNT instruction as part of the SSE4a extension.
  • Intel Core processors introduced a POPCNT instruction with the SSE4.2 instruction set extension that was first available in the Nehalem -based Core i7 processor released in November 2008 .
  • Compaq's Alpha 21264A, released in 1999, was the first Alpha-series CPU design to feature the Counting Extension (CIX).
  • Donald Knuth's model computer MMIX , which will replace MIX in his book The Art of Computer Programming , has an SADDinstruction. SADD a,b,ccounts all bits that are 1 in b and 0 in c and writes the result to a.
  • The ARM architecture introduced the VCNT instruction as part of the Advanced SIMD (NEON) extension.

Individual evidence

  1. Donald E. Knuth: Bitwise tricks & techniques; Binary Decision Diagrams . In: The Art of Computer Programming . tape 4 , Fascicle 1. Addison – Wesley Professional, 2009, ISBN 0-321-58050-8 ( draft [ PostScript ; 1,2 MB ; accessed on February 23, 2017]).
  2. ^ Thomas M. Thompson: From Error-Correcting Codes through Sphere Packings to Simple Groups . In: The Carus Mathematical Monographs . No. 21 . The Mathematical Association of America, 1983, pp. 33 .
  3. James Whitbread Lee Glaisher: On the residue of a binomial-theorem coefficient with respect to a prime modulus . In: The Quarterly Journal of Pure and Applied Mathematics . No. 30 , 1899, pp. 150–156 ( gdz.sub.uni-goettingen.de [accessed on February 23, 2017]).
  4. ^ Irving S. Reed: A Class of Multiple-Error-Correcting Codes and the Decoding Scheme . In: IRE (IEEE) . PGIT, no. 4 , 1954, pp. 38 .
  5. ^ Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan: Chord. A scalable peer-to-peer lookup service for internet applications . In: Proceedings of the 2001 conference on Applications, Technologies, Architectures, and Protocols for Computer Communications . ACM, New York 2001, pp. 149-160 , doi : 10.1145 / 383059.383071 .
  6. ^ SPARC International, Inc. (Ed.): The SPARC architecture manual . Prentice Hall, Englewood Cliffs, NJ 1992, ISBN 0-13-825001-4 , pp. 231 ( gaisler.com [PDF] version 8).
  7. ^ David Blaxell: Computer Science and Statistics: Tenth Annual Symposium on the Interface . In: David Hogben, Dennis W. Fife (Eds.): NBS Special Publication . No. 503 . US Department of Commerce / National Bureau of Standards, February 1978, p. 146-156 ( google.books.com ).
  8. GCC 3.4 Release Notes GNU Project.
  9. LLVM 1.5 Release Notes LLVM Project.
  10. GHC 7.4.1 release notes .
  11. 12.11. Bit Functions - MySQL 5.0 Reference Manual .