byte

from Wikipedia, the free encyclopedia

The byte ([ baɪt ]; well formed as " bit ") is a unit of measurement used in digital technology and computer science , which usually stands for a ( binary ) sequence of 8 bits. Historically, a byte was the number of bits used to encode a single character in a particular computer system and was therefore the smallest addressable element in many computer architectures . The term octet is also used to expressly refer to a number of 8 bits - the term octade , which was also used in the past , is no longer used. The term octet is still used in France .

Demarcation

What exactly a byte denotes is defined slightly differently depending on the area of ​​application. The term can stand for:

  • a unit of measurement for a data volume of 8 bits with the unit symbol  "B", whereby the order of the individual bits is not important.
    The unit symbol should not be confused with the unit symbol "B" belonging to the unit  Bel .
  • an ordered compilation ( n-tuple ) of 8 bits, whose formal ISO -compliant designation is octet (1 byte = 8 bits). An octet is sometimes divided into two halves ( nibbles ) of 4 bits each, whereby each nibble can be represented by a hexadecimal number . An octet can therefore be represented by two hexadecimal digits.
  • the smallest amount of data of a certain technical system , usually addressable via an address bus . The number of bits per character is almost always a natural number. Examples:
    • for Telex : 1 character = 5 bits
    • For computers of the PDP family : 1 character =  bit = around 5.644 bits (Radix 50 code). This results in a saving of a few bits per character string compared to 6 bits , which can be used for control purposes, for example. However, the byte boundaries go right through the bits, which can make content analysis difficult.
    • for IBM 1401 : 1 character = 6 bits
    • with ASCII : 1 character = 7 bits
    • for IBM-PC : 1 character = 8 bits = 1 octet
    • with Nixdorf 820 : 1 character = 12 bits
    • For computer systems of the types UNIVAC 1100/2200 and OS2200 Series: 1 character = 9 bits (ASCII code) or 6 bits (FIELDATA code)
    • For computers of the PDP-10 family : 1 character = 1… 36 bits, byte length freely selectable
  • a data type in programming languages . The number of bits per byte can vary depending on the programming language and platform (mostly 8 bits).
  • ISO- C99 defines 1 byte as a contiguous sequence of at least 8 bits.

In most computers today, these definitions (smallest addressable unit, data type in programming languages, C data type) combine to form a single one and are then of identical size.

Due to the widespread use of systems based on eight bits (or power of two), the term “byte” is used to denote an 8-bit size, which in formal language (according to ISO standards) is correctly octet (from English octet ) is called. In the German language, the term “byte” (in the sense of 8 bit) is used as the unit of measurement for size specifications. A byte can be transmitted in parallel (all bits simultaneously) or serially (all bits one after the other). Check bits are often added to ensure correctness . Additional communication protocols are possible when transmitting larger quantities . With 32-bit computers, 32 bits (four bytes) are often transferred together in one step, even if only an 8-bit tuple has to be transferred. This enables a simplification of the algorithms required for the calculation and a smaller instruction set for the computer.

As with other units there next to the full name of the units, respectively, a unit symbol . For bit and byte these are:

Abbreviation full name
bit (rarely "b") bit
B (rarely "byte") byte

The full name is basically subject to normal declination . Due to the great similarity of the abbreviations with the written unit names as well as the corresponding plural forms in the English language, the unit abbreviations "bit" and "byte" are occasionally provided with plural s.

History of the term

The bit is a suitcase word made up of the English words b inary and dig it , so it means “two-valued digit ” - zero or one. Its components can be traced back to the Latin words digitus (finger), which have been used for counting since ancient times (see Plautus : computare digitis ), and Latin (more precisely neo-Latin) binarius (two times) , compare Latin to (twice) , lead back.

The byte is also a made-up word and was probably from the English bit (German "[the] bit" or "bites") and bite (in German: "[the] bites" or "bite") formed . It was used to identify an amount of memory or data sufficient to represent a character. The term was coined in June 1956 by Werner Buchholz in an early design phase of the IBM 7030 stretch computer, where the spelling was changed from bite to byte to avoid accidentally changing to bit . In the original it described a selectable width of one to six bits (this allowed states, e.g. characters, to be represented) and represented the smallest directly addressable memory unit of a corresponding computer. In August 1956 the definition was expanded to one to eight bits (This could then be used to represent characters). So you could save the letters and common special characters, for example in the source texts of programs or other texts (i.e. different characters).

In the 1960s, the rapidly expanding ASCII was defined, which uses seven bits to encode a character (that is, characters). Later, by using the eighth (most significant) bit that was usually present anyway, extended ASCII-based character sets were developed that can also map the most common international diacritics , such as code page 437 . In these extended character sets, each character corresponds exactly to a byte with eight bits, whereby the first 128 characters correspond exactly to ASCII.

In the 1960s and 1970s, the term octade was also common in Western Europe when specifically referring to 8 bits. This designation probably goes back to the Dutch manufacturer Philips, in whose documents on mainframe computers the designation Oktade (or English oktad [s] ) is regularly found.

Since the early 1970s there have been 4-bit microprocessors whose 4-bit data words (also called nibbles ) can be represented with hexadecimal digits . 8-bit processors were introduced shortly after the invention of the programming languages C and Pascal , i.e. at the beginning of the 1970s, and were in use in home computers until the 1980s (in embedded systems even today), their 8-bit data words (or bytes) can be represented with exactly two hexadecimal digits. Since then, the width of the hardware data words has doubled again and again from 4 to 8, 16, 32 up to today to 64 and 128 bits.

In order to differentiate between the original meaning as the smallest addressable information unit and the meaning as an 8-bit tuple , the term octet is correctly used in the technical literature (depending on the subject) for the latter in order to achieve a clear separation.

Practical use

In electronic data processing, the smallest possible storage unit is called a bit . A bit can have two possible states, which are usually referred to as "zero" and "one". In many programming languages, the data typeboolean ” (or “Boolean” or “BOOLEAN”) is used for a single bit . For technical reasons, however, the actual mapping of a Boolean usually takes the form of a data word (" WORD ").

Eight such bits are combined to form a unit - a data packet, so to speak - and are generally called a byte. The official ISO-compliant designation, however, is octet: 1 octet = 1 byte = 8 bits. Many programming languages support a data type with the name “byte” (or “byte” or “BYTE”), whereby it should be noted that, depending on the definition, this is as a whole number , as a bit set , as an element of a character set or, in the case of type-unsafe programming languages, even simultaneously for several of these data types can be used, so that there is no longer any assignment compatibility .

The byte is the standard unit for designating storage capacities or amounts of data. This includes file sizes, the capacity of permanent storage media ( hard disk drives , CDs , DVDs , Blu-ray discs , floppy disks , USB mass storage devices , etc.) and the capacity of many volatile memories (for example, RAM ). Transmission rates (for example the maximum speed of an Internet connection), on the other hand, are usually given on the basis of bits.

Meanings of decimal and binary prefixes for large numbers of bytes

SI prefixes

For data memories with binary addressing , there are technical storage capacities based on powers of two (2 n  bytes). Since there were no special unit prefixes for powers of two until 1996 , it was common to use the decimal SI prefixes in connection with storage capacities to denote powers of two (with a factor of 2 10  = 1024 instead of 1000). Nowadays the prefixes should only be used in connection with the decimal specification of the memory size. An example:

  • 1 kilobyte (kB) = 1000 bytes, 1 megabyte (MB) = 1000 kilobytes = 1000 × 1000 bytes = 1,000,000 bytes and so on

This is widespread for hard drive disks , SSD drives and other storage media, while the size of the main memory (RAM), graphics memory and processor cache can only be specified in binary format, since the corresponding systems are technically binary. Microsoft Windows still displays the SI prefixes even today, although it calculates sizes with powers of two.

Occasionally there are also mixed forms, for example with the storage capacity of a 3.5-inch diskette (1984):

  • Displayed: 1.44 MB ⇒ But there are: 1440 KiB = 1440 × 1024 bytes = 1,474,560 bytes.

For the prefixes for binary size specifications recommended today, but rarely used outside of the UNIX world, see the following section Binary or IEC prefixes .

Binary or IEC prefixes

In order to avoid ambiguity, the IEC 1996 proposed new unit prefixes that should only be used in the binary meaning. A prefix similar to the SI prefixes is supplemented by the syllable "bi", which makes it clear that it is a binary multiple. An example:

  • 1 Kibibyte (KiB) = 1024 bytes, 1 Mebibyte (MiB) = 1024 × 1024 bytes = 1,048,576 bytes.

The International Bureau of Weights and Measures (BIPM), which is responsible for the SI prefixes, recommends this notation, even if it is not responsible for bytes , as this is not an SI unit . Despite this, many standardization organizations have followed this recommendation.

comparison

Mainly because the manufacturer's memory capacities are usually only given with an SI prefix, confusion can arise, especially in connection with Microsoft systems. Because Microsoft always calculates with powers of two for data sizes, but then specifies them with the help of the SI prefixes. So a 128 GB storage medium is displayed as 119.2 GB, although according to IEC it should be 119.2 GiB. Added to this is the confusion among users that, according to Microsoft, 120 GB (actually 120 GiB) does not fit on a storage medium advertised as 128 GB and an error is output. Comparison:

  • ( 128 GB = 128,000,000,000 bytes) < ( 120 GiB = 128,849,018,880 bytes = 120 × 1024 × 1024 × 1024 bytes)

For larger decimal and binary prefixes, the distinction becomes greater because the nominal difference becomes greater. From one prefix to the next, the ratio of binary to decimal increases by a factor . Between KiB and kB it is 2.4%, but between TiB and TB it is already 10.0% (percentages rounded to 1 decimal place ). The comparison table provides a clear overview of the possible unit prefixes and their meanings .

Capacity specifications for storage media

The manufacturers of mass storage media , such as hard drives , DVD blanks and USB memory sticks , use the decimal prefixes , as is common with international units of measurement , to indicate the storage capacity of their products. As a result, for example, the problem is that a designated with "4.7 GB" DVD disc of software which (the powers of ten to use namely "GB") the powers of two used contrary to the above-mentioned standard (as handles it as the Windows Explorer ), with the deviating value of "4.38 GB" (correct would be to display "4.38 GiB"), although around 4.7 gigabytes (4,700,000,000 bytes) are meant. Likewise, a hard drive specified with "1 TB" with the apparently much smaller capacity of around "931 GB" or "0.9 TB" is recognized (here, too, "931 GiB" or "0.9 TiB" should actually be displayed) , although each around 1.0 terabyte (1,000,000,000,000 bytes) is meant. On the other hand, a blank CD marked with “700 MB” actually contains 700 MiB (734.003.200 bytes), that is about 734 MB (and should therefore, strictly speaking, be marked with “700 MiB”).

The conversion of the size of data volumes into SI units has not posed any problems for more than 30 years. With visual display on screen, the difference in computing effort is irrelevant whether you divide by 1000 (division) or 1024 (arithmetic shift). For the further conversion into a decimal string, divisions by 10 are necessary anyway (or you would have to display "2C9 MB free"). Mass storage devices with upstream complex firmware can be produced in practically any finely graduated size, where production in smooth, easily marketable sizes has become established. RAM main memory and cache memory of CPUs that are accessed in their fairly original form are given as smooth values ​​with binary prefixes, SI prefixes would be extremely impractical here. For customers, the exact size is mostly irrelevant, as they rarely come into direct contact with these sizes.

Apple's macOS from version Mac OS X Snow Leopard (10.6) uses uniform decimal prefixes only in decimal meaning. KDE follows the IEC standard and gives the user the choice between binary and decimal information. For Linux distributions with other desktop environments, such as Ubuntu from version 11.04, there are clear guidelines on how applications should specify data volumes; both entries are found here, with the binary prefixes predominating.

Comparison table

Decimal prefixes   Difference
rounded
  Binary prefixes according to IEC
Surname symbol Number of bytes Surname symbol Number of bytes
kilobyte kB 1 000 = 10 30 2.4% Kibibyte KiB 1 024 = 2 10
megabyte MB 1,000,000 = 10 60 4.9% Mebibyte MiB 1 048 576 = 2 20
Gigabytes GB 1,000,000,000 = 10 90 7.4% Gibibyte GiB 1 073 741 824 = 2 30
Terabytes TB 1 000 000 000 000 = 10 12 10.0% Tebibyte TiB 1 099 511 627 776 = 2 40
Petabytes PB 1 000 000 000 000 000 = 10 15 12.6% Pebibyte PiB 1 125 899 906 842 624 = 2 50
Exabytes EB 1 000 000 000 000 000 000 = 10 18 15.3% Exbibyte Eg 1 152 921 504 606 846 976 = 2 60
Zettabytes E.g. 1 000 000 000 000 000 000 000 = 10 21 18.1% Zebibyte ZiB 1 180 591 620 717 411 303 424 = 2 70
Yottabytes YB 1 000 000 000 000 000 000 000 000 = 10 24 20.9% Yobibyte YiB 1 208 925 819 614 629 174 706 176 = 2 80
  1. SI prefixes are only standardized for SI units; Byte is not an SI unit
  2. is sometimes abbreviated as "KB"
  3. is occasionally (contrary to the standard) abbreviated with "KB", sometimes to distinguish it from "kB"

See also

Web links

Wiktionary: Byte  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. a b Byte - Duden , Bibliographisches Institut, 2016
  2. IEC 60027-2, Ed. 3.0, (2005-2008): Letter symbols to be used in electrical technology - Part 2: Telecommunications and electronics.
  3. Rationale for International Standard - Programming Languages ​​- C. (PDF; 898 kB) April 2003, p. 217 , accessed on November 28, 2009 (English).
  4. Bit (unit in IT) - Duden , Bibliographisches Institut, 2016
  5. bite ( Memento of the original from November 19, 2016 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (English-German) - Duden , Langenscheidt, 2015 @1@ 2Template: Webachiv / IABot / www.duden.de
  6. Werner Buchholz : 7. The Shift Matrix . In: The Link System . IBM , June 11, 1956, pp. 5-6, Stretch Memo No. 39G. Archived from the original : "[...] Most important, from the point of view of editing, will be the ability to handle any characters or digits, from 1 to 6 bits long. Figure 2 shows the Shift Matrix to be used to convert a 60-bit word, coming from memory in parallel, into characters, or "bytes" as we have called them, to be sent to the adder serially. The 60 bits are dumped into magnetic cores on six different levels. Thus, if a 1 comes out of position 9, it appears in all six cores underneath. Pulsing any diagonal line will send the six bits stored along that line to the adder. The adder may accept all or only some of the bits. Assume that it is desired to operate on 4 bit decimal digits, starting at the right. The 0-diagonal is pulsed first, sending out the six bits 0 to 5, of which the adder accepts only the first four (0-3). Bits 4 and 5 are ignored. Next, the 4 diagonal is pulsed. This sends out bits 4 to 9, of which the last two are again ignored, and so on. It is just as easy to use all six bits in alphanumeric work, or to handle bytes of only one bit for logical analysis, or to offset the bytes by any number of bits. All this can be done by pulling the appropriate shift diagonals. An analogous matrix arrangement is used to change from serial to parallel operation at the output of the adder. [...] "


  7. Werner Buchholz : 5. Input-Output . In: Memory Word Length . IBM , July 31, 1956, p. 2, Stretch Memo No. 40. Archived from the original : “[…] 60 is a multiple of 1, 2, 3, 4, 5, and 6. Hence bytes of length from 1 to 6 bits can be packed efficiently into a 60-bit word without having to split a byte between one word and the next. If longer bytes were needed, 60 bits would, of course, no longer be ideal. With present applications, 1, 4, and 6 bits are the really important cases. With 64-bit words, it would often be necessary to make some compromises, such as leaving 4 bits unused in a word when dealing with 6-bit bytes at the input and output. However, the LINK computer can be equipped to edit out these gaps and to permit handling of bytes which are split between words. [...] "
  8. ^ Robert William Bemer: Why is a byte 8 bits? Or is it? In: Computer History Vignettes. August 8, 2000, archived from the original on April 3, 2017 ; Retrieved September 15, 2018 : "[...] I came to work for IBM, and saw all the confusion caused by the 64-character limitation. Especially when we started to think about word processing, which would require both upper and lower case. [...] I even made a proposal (in view of STRETCH, the very first computer I know of with an 8-bit byte) that would extend the number of punch card character codes to 256 [...] So some folks started thinking about 7 -bit characters, but this was ridiculous. With IBM's STRETCH computer as background, handling 64-character words divisible into groups of 8 (I designed the character set for it, under the guidance of Dr. Werner Buchholz , the man who DID coin the term "byte" for an 8-bit grouping). [...] It seemed reasonable to make a universal 8-bit character set, handling up to 256. In those days my mantra was "powers of 2 are magic". And so the group I headed developed and justified such a proposal […] The IBM 360 used 8-bit characters, although not ASCII directly. Thus Buchholz's “byte” caught on everywhere. I myself did not like the name for many reasons. The design had 8 bits moving around in parallel. But then came a new IBM part, with 9 bits for self-checking, both inside the CPU and in the tape drives. I exposed this 9-bit byte to the press in 1973. But long before that, when I headed software operations for Cie. Bull in France in 1965–1966, I insisted that “byte” be deprecated in favor of “ octet ”. [...] It is justified by new communications methods that can carry 16, 32, 64, and even 128 bits in parallel. But some foolish people now refer to a "16-bit byte" because of this parallel transfer, which is visible in the UNICODE set. I'm not sure, but maybe this should be called a " hextet ". [...] "
  9. ^ Peter Fenwick: Introduction to Computer Data Representation. In: books.google.de. P. 231 , accessed November 2, 2017 .
  10. ^ Gerrit Anne Blaauw , Frederick Phillips Brooks, Jr., Werner Buchholz: Processing Data in Bits and Pieces . In: IRE Transactions on Electronic Computers . June 1959, p. 121.
  11. ^ Gerrit Anne Blaauw , Frederick Phillips Brooks, Jr., Werner Buchholz: 4: Natural Data Units . In: Werner Buchholz (Ed.): Planning a Computer System - Project Stretch . McGraw-Hill Book Company / The Maple Press Company, York PA., 1962, pp. 39-40. Archived from the original : "[...] Terms used here to describe the structure imposed by the machine design, in addition to bit , are listed below. Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite , but respelled to avoid accidental mutation to bit .) A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory. (The term catena was coined for this purpose by the designers of the Bull Gamma 60 computer.) Block refers to the number of words transmitted to or from an input-output unit in response to a single input-output instruction. Block size is a structural property of an input-output unit; it may have been fixed by the design or left to be varied by the program. [...] "


  12. a b Werner Buchholz : The Word "Byte" Comes of Age ... . In: Byte Magazine . 2, No. 2, February 1977, p. 144. “[…] The first reference found in the files was contained in an internal memo written in June 1956 during the early days of developing stretch. A byte was described as consisting of any number of parallel bits from one to six. Thus a byte was assumed to have a length appropriate for the occasion. Its first use was in the context of the input-output equipment of the 1950s, which handled six bits at a time. The possibility of going to 8 bit bytes was considered in August 1956 and incorporated in the design of Stretch shortly thereafter. The first published reference to the term occurred in 1959 in a paper "Processing Data in Bits and Pieces" by G A Blaauw , F P Brooks Jr and W Buchholz in the IRE Transactions on Electronic Computers , June 1959, pages 121. The notions of that paper were elaborated in Chapter 4 of Planning a Computer System (Project Stretch) , edited by W Buchholz, McGraw-Hill Book Company (1962). The rationale for coining the term was explained there on pages 40 as follows: Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (ie, different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite , but respelled to avoid accidental mutation to bit .) System / 360 took over many of the stretch concepts, including the basic byte and word sizes, which are powers of 2. For economy, however, the byte Size was fixed at the 8 bit maximum, and addressing at the bit level was replaced by byte addressing. [...] Since then the term byte has generally meant 8 bits, and it has thus passed into the general vocabulary. [...] "

  13. Werner Buchholz : 2. Input-Output Byte Size . In: Memory Word Length and Indexing . IBM , September 19, 1956, p. 1, Stretch Memo No. 45. Archived from the original : “[…] The maximum input-output byte size for serial operation will now be 8 bits, not counting any error detection and correction bits. Thus, the exchange will operate on an 8-bit byte basis, and any input-output units with less than 8 bits per byte will leave the remaining bits blank. The resultant gaps can be edited out later by programming [...] "
  14. ^ Robert William Bemer : A proposal for a generalized card code of 256 characters . In: Communications of the ACM . 2, No. 9, 1959, pp. 19-23. doi : 10.1145 / 368424.368435 .
  15. Philips Data Systems' product range - April 1971. (PDF) (No longer available online.) Philips, 1971, archived from the original on March 4, 2016 ; Retrieved August 3, 2015 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.intact-reunies.nl
  16. ^ RH Williams: British Commercial Computer Digest: Pergamon Computer Data Series . Pergamon Press, 1969, p. 308 (English).
  17. See international standard IEC 60027-2: 2005, 3rd edition. Letter symbols to be used in electrical technology - Part 2: Telecommunications and electronics published. Now adopted by the global IEC standard IEC 80000-13: 2008 (or DIN EN 80000-13: 2009-01)
  18. BIPM - SI brochure, 8th edition. March 2006, Section 3.1: SI Prefixes. Side note. BIPM: Decimal multiples and submultiples of SI units - SI Brochure: The International System of Units (SI) [8th edition, 2006; updated in 2014], with "kibibyte". Retrieved April 7, 2015 .
  19. Eric Schäfer: File sizes: Snow Leopard counts differently. In: Mac Life. August 28, 2009. Retrieved August 28, 2009 .
  20. Unit Policy. Ubuntu, accessed April 24, 2010 .