Bit string

from Wikipedia, the free encyclopedia

In computer science , a bit string (also bit array , bit vector and bit string ) is a (finite) sequence of characters from the two-character alphabet Σ: = {0,1}, the smallest interesting one. While the bit field (see article) is completely embedded in the data type of a binary number and the individual bit can only be addressed via its programming language symbol, an integer bit index (or several) is added to the bit array . Bit string and ( 1-dimensional ) bit vector are conceptually similar when accessing a single bit.

Although the sequence of binary digits of a bit vector can be used to form a binary number (of any size), a bit vector does not initially have anything to do with a binary number - it has to be formed by a specially designed function.

Examples of meanings that can be assigned to the bits or bit groups are as with the bit field: 1/0, true / false or true / false ( Boolean variable ), present / not present, light / dark etc. Any subset of the finite set {0, 1, 2,…, n −1} can be described by a bit vector (or a bit string) of length n and the set operations intersection and union by logical operations on bit vectors. The BMP file format for black and white graphics contains (in addition to additional information) a 2-dimensional bit array in which each bit is assigned to a black and white pixel .

The bit chains include typical chain operations as they also occur with character strings , e.g. B.

  • Form (extract) partial chain
  • Put partial chain in chain
  • Find substring in chain
  • Concatenate two chains
  • Length of the chain (in bits)
  • Lexicographical comparison

Join in

The extraction of a partial chain can also be done by shifting it to the left and one to the right. The concatenation of two chains can also be achieved by shifting the first chain to the left and a logical OR operation with the second chain on the drawn-in part.

The sieve of Eratosthenes for determining prime numbers is a classic application of a bit vector.

Programming language support

Programming Language Support for real bit arrays are available in PL / I . With the exception of converting to an integer of any length, all functions mentioned are available.

In the programming language C ++ there is the data type bool, which uses not only 1 bit, but at least one byte and can not be aggregated into the chains or fields of this article.

The programming languages C and C ++, especially C, present a transparent memory model and allow very machine-oriented programming and full control of type conversions in the area of ​​arithmetic / logic.

A single bit is not directly addressable, so it can only be made tangible with binary arithmetic means - the easiest way is with the help of shift commands . With these commands, the positional significance of a bit within the binary number represented in the dual system is specifically changed, so that each of the 8 bits can be assigned a unique bit offset within the byte address in such a way that the shift commands deal with this offset linearly ( detailed description in the article bit value # addressing of bits ).

For real bit arrays there are, for example, the following classes in C ++:

  • std::bitset with limits to be defined for the indices at compile time
  • dynamic_bitsetthe boost library with index limits, the definition of which can be delayed until runtime, and with some more complex bit chain functions

Other programming languages:

Bit chains have a lot in common with integers of variable length , provided they are based on the dual system. Examples:

The implementations of many of these packages are little-endian , which can affect initialization, input / output, and conversion.

Fill bits

Since the smallest addressable unit is the byte that contains several, i. d. As a rule, eight bits comprise, bytes that are not completely filled also contain further bits, which are called padding or slack bits.

When working with C, it is advisable to explicitly set the filler bits to a defined value, e.g. 0. In addition, when chaining 2 bit strings B1and B2, if the length of len1the first string is B1not divisible by 8, the filler bits must be overwritten by the first bits of B2, so the string B2must be shifted by len1 mod 8 bits into the higher indices.

See also

literature