ISO 8859-1

from Wikipedia, the free encyclopedia
ISO 8859
-1 Latin-1 , Western European
-2 Latin-2 , Central European
-3 Latin-3 , Southern European
-4 Latin-4 , Northern European
-5 Cyrillic
-6 Arabic
-7 Greek
-8th Hebrew
-9 Latin-5 , Turkish
-10 Latin-6 , Nordic
-11 Thai
-12 (does not exist)
-13 Latin-7 , Baltic
-14 Latin-8 , Celtic
-15 Latin-9 , Western European
-16 Latin-10 , Southeast European

ISO 8859-1 , more precisely ISO / IEC 8859-1 , also known as Latin-1 , is a standard for information technology for character encoding with eight bits, last updated by ISO in 1998 , and the first part of the ISO 8859 family of standards .

The characters that can be coded with seven bits correspond to US- ASCII with a leading zero bit. In addition to the 95 representable ASCII characters (20 16 –7E 16 ), ISO 8859-1 encodes 96 more (A0 16 –FF 16 ), so a total of 191 of the theoretically possible 256 (= 2 8 ). Positions 00 16 –1F 16 and 7F 16 –9F 16 are not assigned any characters in ISO / IEC 8859 and therefore ISO / IEC 8859-1. This area was deliberately kept free in order to be able to use the corresponding bytes for device control or to ensure that these do not conflict with such control characters if the coding is insufficiently specified. The designation ISO-8859-1 (with hyphen) defined by the IANA stands for the combination of the characters of this standard with non-displayable control characters according to ISO / IEC 6429.

ISO / IEC 8859-1 tries to cover as many characters as possible in Western European languages. Since some characters are missing in addition to the euro symbol , especially for French , ISO 8859-15 was created as an alternative .

ISO 8859-1 is closely related to the 8-bit character encoding Windows-1252 used in the Windows operating system . Both codings differ in the range 80 16 to 9F 16 : While ISO / IEC 8859-1 keeps this area free so that control characters can be coded here, Windows-1252 occupies it with additional printable characters. This encoding therefore also supports most Western European languages ​​and also contains all printable characters from ISO 8859-15. Some applications mix the definition of ISO 8859-1 and Windows-1252. Since the additional control characters from ISO 8859-1 have no meaning in HTML, for example , the printable characters from Windows-1252 are often used. For this reason, the new HTML5 standard stipulates that texts marked as ISO 8859-1 are to be interpreted as Windows-1252. In January 2019, 3.5% of all websites were using ISO 8859-1 and the trend was falling. Latin-1 is the second most common coding of websites after UTF-8 (93.0%). Windows-1252 is used by 0.6% of websites. The differences between all of these encodings and a general lack of consistency in supporting different character sets are common interoperability problems.

The 8-bit character coding Commodore Amiga , which is used under the AmigaOS operating system, is based on ISO 8859-1 and the control characters from ISO / IEC 6429 , and differs only in four modifications.

Due to the widespread use of ISO 8859-1, the Unicode standard was created in such a way that the Unicode standard is an extension of ISO 8859-1. A character that is coded by the byte value x in ISO 8859-1 therefore occupies the code point x in the Unicode standard . The byte sequence actually used can differ from the code point, e.g. B. with UTF-8 coding.

history

ISO 8859-1 is based on the DEC Multinational Character Set used by Digital Equipment Corporation in Terminal VT220 . It was originally developed by the European Computer Manufacturers Association (ECMA) and published as ECMA-94 in March 1985 . The second edition of ECMA-94 also included ISO 8859-2 , ISO 8859-3 and ISO 8859-4 as part of the specification.

Tables

ISO / IEC 8859-1

code … 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
0 ... not used
1…
2… SP ! " # $ % & ' ( ) * + , - . /
3… 0 1 2 3 4th 5 6th 7th 8th 9 : ; < = > ?
4… @ A. B. C. D. E. F. G H I. J K L. M. N O
5… P Q R. S. T U V W. X Y Z [ \ ] ^ _
6… ` a b c d e f G H i j k l m n O
7… p q r s t u v w x y z { | } ~
8th… not used
9 ...
A ... NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
B ... ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
C ... À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D ... Ð Ñ O O O O Ö × O Ù Ú Û Ü Ý Þ ß
E ... à á â ã Ä å æ ç è é ê ë ì í î ï
F ... ð ñ O O O O ö ÷ O ù ú û ü ý þ ÿ

SP (for English space , 20 hex ) is the space, NBSP ( non-breaking space , A0 hex ) is the fixed space and SHY ( soft hyphen , AD hex ) is the " conditional hyphen " that is normally only visible at the end of a line .

ISO / IEC 8859-1 combined with special characters from ISO / IEC 6429

code … 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
0 ... NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1… DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2… like ISO / IEC 8859, Windows-125X and US-ASCII
3…
4…
5…
6…
7… DEL
8th… PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3
9 ... DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST OSC PM APC
A ... like ISO / IEC 8859-1 and Windows-1252
B ...
C ...
D ...
E ...
F ...

The IANA has registered the following equivalent non-case-sensitive designations for this code table for use in Internet applications such as MIME :

  • ISO_8859-1: 1987
  • ISO_8859-1
  • ISO-8859-1
  • ISO-IR-100
  • csISOLatin1
  • latin1
  • l1
  • IBM819
  • CP819

use

Along with US-ASCII and UTF-8 (a Unicode encoding), ISO 8859-1 is probably the most important and most frequently used encoding for Latin scripts.

ISO 8859-1 is sufficient for at least the following languages:

  • Afrikaans ( È / è, É / é, Ê / ê, Ë / ë, Î / î, Ï / ï, Ô / ô, Û / û ),
  • Albanian ( Ç / ç, Ë / ë ),
  • Basque ( Ñ ​​/ ñ ),
  • Danish ( Å / å, Æ / æ, Ø / ø ),
  • German ( Ä / ä, Ö / ö, Ü / ü, ß , in foreign words: É / é, not the euro symbol and possibly ſ ),
  • English ( £, ¢ ; outdated: Æ / æ, ä, ë, ï, ö, ü , not Œ / œ ),
  • Estonian ( Ä / ä, Ö / ö, Ü / ü, Õ / õ , not Š / š, Ž / ž (in foreign words)),
  • Faroese ( Á / á, Ð / ð, Í / í, Ó / ó, Ú / ú, Ý / ý, Æ / æ, Ø / ø ),
  • Finnish ( Ä / ä, Ö / ö , in foreign words: Å / å, not Š / š, Ž / ž ),
  • French ( Æ / æ, À / à,  / â, È / è, É / é, Ê / ê, Ë / ë, Î / î, Ï / ï, Ô / ô, Ù / ù, Û / û, Ç / ç, Ü / ü, ÿ , not Œ / œ, Ÿ ),
  • Irish Gaelic , new orthography ( Á / á, É / é, Í / í, Ó / ó, Ú / ú ),
  • Icelandic ( Á / á, Ð / ð, É / é, Í / í, Ó / ó, Ú / ú, Ý / ý, Þ / þ, Æ / æ, Ö / ö ),
  • Italian ( À / à, È / è, É / é, Ò / ò, Ù / ù ),
  • Catalan ( À / à, Ç / ç, È / è, É / é, Í / í, Ï / ï, Ò / ò, Ó / ó, Ú / ú, Ü / ü, not dagg. Ŀl / ŀl ),
  • Dutch (not IJ / ij , but ÿ , Ë / ë),
  • North Frisian ( Ä / ä, Ö / ö, Ü / ü, Å / å , not Ā / ā, Đ / đ, Ē / ē for Sölring ),
  • Norwegian , Bokmål and Nynorsk ( Å / å, Æ / æ, Ø / ø, Ò / ò ),
  • Portuguese incl. Portuguese (Brazil) ( ª, º, À / à, Á / á, Â / â, Ã / ã, Ç / ç, É / é, Ê / ê, Í / í, Ó / ó, Ô / ô, Õ / õ, Ú / ú, Ü / ü ),
  • Romansh ,
  • Scottish Gaelic ( À / à, È / è, Ì / ì, Ò / ò, Ù / ù )
  • Swedish ( Å / å, Ä / ä, Ö / ö ),
  • Spanish ( ¡, ¿, ª, º, Á / á, É / é, Í / í, Ñ / ñ, Ó / ó, Ú / ú, Ü / ü , previously also Ç / ç ),
  • Swahili and
  • Walloon ( Â / â, Å / å, Ç / ç, È / è, É / é, Ê / ê, Î / î, Ô / ô, Û / û ).

Turkish and Hungarian are only partially supported.

As the supported languages ​​are now widely used in Western Europe , America and Australia , it is the dominant 8-bit character encoding everywhere. It is also widespread in parts of Africa where the Arabic script is not used, although some special characters are often missing, but they are not available in any other 8-bit encoding, see e.g. B. Pannigerian alphabet .

Use of diacritical marks
code … 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
C ... / E ... À / à Á / á Â / â Ã / ã Ä / Ä Å / å Æ / æ Ç / ç È / è É / é Ê / ê Ë / ë Ì / ì Í / í Î / î Ï / ï
fra, ita, cat, por, sco fao, gle, isl, por, spa fra, por, wln por deu, eng, est, fin, swe dan, fin, nor, swe, wln dan, eng, fao, fra, isl, nor alb, fra, cat, por, wln afr, fra, ita, cat, sco, wln afr, fra, gle, isl, ita, cat, por, spa, wln afr, fra, por, wln afr, alb, tight, fra sco fao, fra, gle, isl, cat, por, spa afr, wln afr, eng, fra, cat
D ... / F ... Ð / ð Ñ ​​/ ñ Ò / ò Ó / ó Ô / ô Õ / õ Ö / ö Ø / ø Ù / ù Ú / ú Û / û O / o Ý / ý Þ / þ ß / ÿ
fao, isl baq, spa ita, cat, sco fao, gle, isl, cat, por, spa afr, fra, por, wln est, por deu, eng, est, fin, isl, swe dan, fao, nor fra, ita, sco fao, gle, isl, cat, por, spa afr, fra, wln deu, eng, est, fra, cat, por, spa fao, isl isl deu, est, fra, nld

See also

Web links

Individual evidence

  1. HTML 5.1 Nightly Editor's Draft February 19, 2013, 8.2.2.2 Character encodings , accessed February 19, 2013.
  2. Character encoding w3techs.com.
  3. Faq w3techs.com.
  4. ECMA (Ed.): Standard ECMA-94: 8-Bit Single-Byte Coded Graphic Character Sets . 2nd Edition. June 1984 ( ecma-international.org [PDF; 2.7 MB ; accessed on January 4, 2008]).