# Amount of data

The amount of data is a measure of the amount of data . The basic unit of data volume is the bit .

Data are used to store and transmit information , whereby it should be noted that the information content of a message is not the same as the amount of data, even if the word information is often used in this context to refer to data. In contrast to the amount of data, the information content cannot be read off immediately, and there are various approaches to determining it. The amount of data stored in a file is known as the file size . In the case of data carriers , the amount of data is used to specify the free and maximum storable data volume ( storage capacity ).

The amount of data that is required to store a given piece of information depends on the one hand on the complexity of the information and on the other hand on the coding method . For large amounts of data there are compression methods that reduce the amount of data but store the same information. A suitable coding method is used to increase the information content of the individual characters or to reduce the entropy of the message (see also entropy coding ).

Data does not necessarily have to be explicitly encoded as bits or stored in a computer. Data is everywhere in nature and our everyday world. The largest amounts of data are in our brain , in our libraries , books , films , pictures and computers , in the genome and the molecular structures of animate nature, in the laws of inanimate and animate nature , in the structure of the entire universe and in the maximum possible information the history of all of space.

## Basic unit

The smallest data unit that can be represented is the bit . Bit is short for Binary digit , dt. Binary digit . A data memory with one bit of storage capacity has only one memory location with two options: for example “occupied or empty”, “on or off”, “notch or no notch”. The amount of data contained in a single yes / no decision is therefore exactly 1 bit. For four possible values ​​(for example red, yellow, green, blue) two bits are required which can be combined in four different ways (00, 01, 10, 11).

Formally, this means that the required amount of data (number of bits) is the rounded result of the logarithm for base 2 of the number of possible values. ${\ displaystyle D}$${\ displaystyle Z}$

${\ displaystyle D = \ left \ lceil \ operatorname {ld} (Z) \ right \ rceil}$

or vice versa: The number of possible values ​​is 2 to the power of the number of bits:

${\ displaystyle Z = 2 ^ {D}}$

So for example

• 0 bit Z = 1, if D = 0, since 2 0 = 1${\ displaystyle \ Rightarrow}$
• 1 bit Z = 2, if D = 1, since 2 1 = 2${\ displaystyle \ Rightarrow}$
• 2 bit Z = 4, if D = 2, since 2 2 = 4${\ displaystyle \ Rightarrow}$
${\ displaystyle \ vdots}$
• 7 bit Z = 128, if D = 7, since 2 7 = 128${\ displaystyle \ Rightarrow}$

The summation of the bits from 0 to 7 (corresponding to 1 byte) 2 8 -1 can therefore cover a decimal value range from 0 to 255.

• 8 bit Z = 256, if D = 8, since 2 8 = 256${\ displaystyle \ Rightarrow}$

...

• 63 bit Z = 9223372036854775808, if D = 63, since 2 63 = 9223372036854775808${\ displaystyle \ Rightarrow}$

For D = 1 KiB, the number Z of possible values ​​is very large: 2 1024 ≈ 1.8 · 10 308 .

## More units

Besides the bit, the most common unit for the amount of data is the byte (or octet ), which consists of 8 bits. There are historical reasons for this: Many devices were designed in such a way that they could process 8 bits at the same time (today these are usually 32 or 64 bits - see data word for this ), so 8 bits were viewed as a number by the processing unit . Furthermore, letters from most character sets , especially from ISO 8859 , are represented as a byte.

In the history of computers , there have been systems that only combined 5 bits into one byte, and there have also been systems that combined 13 bits into one byte.

To designate larger amounts of data, the unit characters bit for bit and B for byte are provided with the common prefixes for units of measurement , i.e. kilo (kbit / kB), mega (Mbit / MB), giga (Gbit / GB), tera (Tbit / TB) and so on. There are special binary prefixes for data sets based on powers of two, such as those that occur in semiconductor memories .

## Examples of amounts of data

• 1 bit - (2 1 = 2 possible states), e.g. B. 0 or 1 or false or true
• 5 bit - (2 5 = 32 possible states), e.g. B. may allow the capital letters stock of the Latin alphabet are mapped
• 7 bit - (2 7 = 128 possible states), e.g. B. a character from the ASCII character set

Nibble or half byte

• 1 nibble - (2 4 = 16 possible states), e.g. B. 015
• 2 nibble - (2 8 = 256 possible states), i.e. 1 octet

Byte or octet (8 bits)

• 1 octet - (2 8 = 256 possible states), e.g. B. a character from the ANSI coding (extended Latin alphabet )
• 2 octet - (2 16 = 65,536 possible states)
• 4 octets - (2 32 = about 4.3 billion possible states), e.g. B. a character in UTF-32 format

### Relevant prefixes

Greek and Italian prefixes for units of measure are usually preceded by bits and bytes . In the following, the SI prefixes (k, M, T, G, ...) are used in their decimal meaning. In IT practice, the SI prefixes are mostly used as binary prefixes (1 kB = 1024 bytes, ...) for data volumes. The acceptance of the IEC binary prefixes (Ki, Mi, Gi, ...) provided for this purpose is low in the IT industry, even with the normal names, the 1024-byte conversion is usually implied.

#### Kilos and Kibi

Kilobyte (kB) (10 3  bytes = 1000 bytes),
Kibibyte (KiB) (2 10  bytes = 1024 bytes), but KB is usually written to distinguish it from kB because it is more common.

• approx. 1–2 kB: one standard page as text (ANSI / ASCII-coded)
• approx. 5 kB: a typewriter page with 63 lines of 80 characters each (ANSI / ASCII coded) in A4 format
• approx. 79 kB: Total storage space of the navigation computer of the Apollo 11 lunar module in 1969
• 1440 KiB: a high-density 3.5-inch floppy disk

#### Mega and Mebi

Megabyte (MB) (10 6  bytes = 1,000,000 bytes),
Mebibytes (MiB) (2 20  bytes = 1,048,576 bytes)

• approx. 4 MB: the Bible as text (ANSI / ASCII-coded)
• approx. 703.1 MiB or approx. 737.25 MB: a conventional 700 MB data CD-ROM

#### Giga and Gibi

Gigabyte (GB) (10 9  bytes = 1,000,000,000 bytes),
Gibibytes (GiB) (2 30 bytes = 1,073,741,824 bytes)

• approx. 4.38 GiB so approx. 4.7 GB: a DVD ± R
• approx. 5 GB: a compressed movie in DVD quality (with MPEG-2 compression)

#### Tera and Tebi

Terabyte (TB) (10 12  bytes = 1000 GB),
Tebibyte (TiB) (2 40  bytes = 1,099,511,627,776 bytes)

• A database that records 10 billion people with data sets of 1 KB each requires 10 TB of storage.

#### Peta and Pebi

Petabyte (PB) (10 15  bytes = 1,000,000 GB),
Pebibyte (PiB) (2 50  bytes = 1,125,899,906,842,624 bytes)

• The storage capacities of the world's largest data centers were between 1 PB and 10 PB at the end of 2002
• In 1986 the effective capacity of the world to exchange (optimally compressed) information through (bidirectional) telecommunications networks was 281 petabytes .

#### Exa and Exbi

Exabyte (EB) (10 18  bytes),
Exbibyte (EiB) (2 60  bytes = 1,152,921,504,606,846,976 bytes)

• The total of all printed works is estimated at 0.2 EB
• The world's effective capacity to exchange information through (bi-directional) telecommunications networks was 65 (optimally compressed) exabytes in 2007, and the global technological capacity to store information was an estimated 295 (optimally compressed) exabytes in 2007.

#### Zetta and Zebi

Zettabyte (ZB) (10 21  bytes),
Zebibyte (ZiB) (2 70  bytes = 1,180,591,620,717,411,303,424 bytes)

#### Yotta and Yobi

Yottabyte (YB) (10 24  bytes),
Yobibyte (YiB) (2 80  bytes = 1,208,925,819,614,629,174,706,176 bytes)

• 1 YB are about as many bytes as there are atoms in 1.67 grams of hydrogen , corresponding to 0.83 mol of H 2 .