Bit string

In computer science , a bit string (also bit array , bit vector and bit string ) is a (finite) sequence of characters from the two-character alphabet Σ: = {0,1}, the smallest interesting one. While the bit field (see article) is completely embedded in the data type of a binary number and the individual bit can only be addressed via its programming language symbol, an integer bit index (or several) is added to the bit array . Bit string and ( 1-dimensional ) bit vector are conceptually similar when accessing a single bit.

Although the sequence of binary digits of a bit vector can be used to form a binary number (of any size), a bit vector does not initially have anything to do with a binary number - it has to be formed by a specially designed function.

Examples of meanings that can be assigned to the bits or bit groups are as with the bit field: 1/0, true / false or true / false ( Boolean variable ), present / not present, light / dark etc. Any subset of the finite set {0, 1, 2,…, n −1} can be described by a bit vector (or a bit string) of length n and the set operations intersection and union by logical operations on bit vectors. The BMP file format for black and white graphics contains (in addition to additional information) a 2-dimensional bit array in which each bit is assigned to a black and white pixel .

The bit chains include typical chain operations as they also occur with character strings , e.g. B.

Form (extract) partial chain
Put partial chain in chain
Find substring in chain
Concatenate two chains
Length of the chain (in bits)
Lexicographical comparison

Join in

(component-wise) logical operations (AND, OR, NOT)
Convert to binary number

The extraction of a partial chain can also be done by shifting it to the left and one to the right. The concatenation of two chains can also be achieved by shifting the first chain to the left and a logical OR operation with the second chain on the drawn-in part.

The sieve of Eratosthenes for determining prime numbers is a classic application of a bit vector.

Programming language support

Programming Language Support for real bit arrays are available in PL / I . With the exception of converting to an integer of any length, all functions mentioned are available.

In the programming language C ++ there is the data type bool, which uses not only 1 bit, but at least one byte and can not be aggregated into the chains or fields of this article.

The programming languages C and C ++, especially C, present a transparent memory model and allow very machine-oriented programming and full control of type conversions in the area of arithmetic / logic.

A single bit is not directly addressable, so it can only be made tangible with binary arithmetic means - the easiest way is with the help of shift commands . With these commands, the positional significance of a bit within the binary number represented in the dual system is specifically changed, so that each of the 8 bits can be assigned a unique bit offset within the byte address in such a way that the shift commands deal with this offset linearly ( detailed description in the article bit value # addressing of bits ).

For real bit arrays there are, for example, the following classes in C ++:

std::bitset with limits to be defined for the indices at compile time
dynamic_bitsetthe boost library with index limits, the definition of which can be delayed until runtime, and with some more complex bit chain functions

Other programming languages:

The Common Lisp HyperSpec of Common Lisp : The ANSI Common Lisp standard supports bit vectors (. Engl bit-vector ) (s. Functions on arrays of bits ).

Bit chains have a lot in common with integers of variable length , provided they are based on the dual system. Examples:

The implementations of many of these packages are little-endian , which can affect initialization, input / output, and conversion.

Fill bits

Since the smallest addressable unit is the byte that contains several, i. d. As a rule, eight bits comprise, bytes that are not completely filled also contain further bits, which are called padding or slack bits.

When working with C, it is advisable to explicitly set the filler bits to a defined value, e.g. 0. In addition, when chaining 2 bit strings B1and B2, if the length of len1the first string is B1not divisible by 8, the filler bits must be overwritten by the first bits of B2, so the string B2must be shifted by len1 mod 8 bits into the higher indices.

literature

OS PL / I Checkout and Optimizing Compilers: Language Reference Manual. IBM Form GC33-0009 (1970)
Paul Abrahams: The PL / I Programming Language . Courant Mathematics and Computing Laboratory, New York University, 1979.
IBM PL / I Compilers for z / OS , AIX , MVS and VSE
Stephen Prata: C primer plus , 5th ed. Edition, Sams, Indianapolis, Ind 2007, ISBN 0-672-32696-5 .

Bit string

contents

Programming language support

Fill bits

See also

literature