Windows-1252

from Wikipedia, the free encyclopedia
Windows code pages
0874 Thai
0932 Japanese
0936 Simplified Chinese
0949 Korean
0950 Traditional Chinese
1250 Central European
1251 Cyrillic
1252 Western European
1253 Greek
1254 Turkish
1255 Hebrew
1256 Arabic
1257 Baltic
1258 Vietnamese

Windows-1252 also CP  1252 as well as Western European or ANSI . is an 8-bit character encoding that was developed for the Microsoft Windows operating system . The character set is based on ISO 8859-1 (Latin-1), but deviates from this in the range 80 16  - 9F 16 , instead of the (very rarely used) C1 control characters , these 32 positions contain 27 displayable characters, among others. a. the characters added in ISO 8859-15 and some necessary for better typography .

Some applications mix the definition of ISO 8859-1 and Windows-1252. Since the additional control characters from ISO 8859-1 have no meaning in HTML either, the HTML5 standard stipulates that texts marked as ISO 8859-1 are to be interpreted as Windows-1252. Nonetheless, Windows-1252 is also registered with the IANA. In January 2019, 3.5% of all websites use the character encoding implicitly as ISO 8859-1, with 0.6% of the websites Windows-1252 is used explicitly, with a falling trend. Latin-1 is the second most common coding of websites after UTF-8 (93.0%), Windows-1252 is the fourth most common after Windows-1251 . The differences between all of these encodings and a general lack of consistency in supporting different character sets are common interoperability problems.

code … 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
0 ... NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1… DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2… SP ! " # $ % & ' ( ) * + , - . /
3… 0 1 2 3 4th 5 6th 7th 8th 9 : ; < = > ?
4… @ A. B. C. D. E. F. G H I. J K L. M. N O
5… P Q R. S. T U V W. X Y Z [ \ ] ^ _
6… ` a b c d e f G H i j k l m n O
7… p q r s t u v w x y z { | } ~ DEL
8th… ƒ " ... ˆ Š Œ Ž
9 ... ' ' " - - ˜ š œ ž Ÿ
A ... NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
B ... ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
C ... À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D ... Ð Ñ O O O O Ö × O Ù Ú Û Ü Ý Þ ß
E ... à á â ã Ä å æ ç è é ê ë ì í î ï
F ... ð ñ O O O O ö ÷ O ù ú û ü ý þ ÿ

The colored code points represent changes compared to ISO 8859-1: Yellow fields are occupied, green fields are not used.

Since Unicode is based on ISO 8859-1 and not on Windows-1252, the Unicode code points of the characters not highlighted in color are identical to the code values ​​in Windows-1252, but not those with a colored background:

Unicode mapping of characters different from ISO 8859-1
… 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
8th…
U + 20ac
'
U + 201a
ƒ
U + 0192
"
U + 201e
...
U + 2026

U + 2020

U + 2021
U
+ 02c6

U + 2030
Š
U + 0160

U + 2039
Œ
U + 0152
Ž
U + 017d
 
9 ...   '
U + 2018
'
U + 2019

U + 201c
"
U + 201d

U + 2022
-
U + 2013
-
U + 2014
˜
U + 02dc

U + 2122
š
U + 0161

U + 203a
œ
U + 0153
ž
U + 017e
Ÿ
U + 0178

Differences between ISO 8859-1, ISO 8859-15, Windows-1252 and Unicode

In addition to the characters from ISO 8859-1 , Windows-1252 also contains those characters that were added in ISO 8859-15 and replace some less often used characters from ISO 8859-1. However, the position of these characters differs between Windows-1252 and ISO 8859-15 as well as the encoding in Unicode. All characters that do not appear in one of the two ISO encodings occupy the following positions.

Differences between ISO 8859-1, ISO 8859-15, Windows-1252 and Unicode (Part 1)
character Š š Ž ž Œ œ Ÿ ¤ ¦ ¨ ´ ¸ ¼ ½ ¾
ISO 8859-1 - - - - - - - - A4 A6 A8 B4 B8 BC BD BE
ISO 8859-15 A4 A6 A8 B4 B8 BC BD BE - - - - - - - -
Windows-1252 80 8A 9A 8E 9E 8C 9C 9F A4 A6 A8 B4 B8 BC BD BE
Unicode 20AC 160 161 17D 17E 152 153 178 A4 A6 A8 B4 B8 BC BD BE
Differences between ISO 8859-1, ISO 8859-15, Windows-1252 and Unicode (Part 2)
character ƒ " ... ˆ ' ' " - - ˜
ISO 8859-1 - - - - - - - - - - - - - - - - - - -
ISO 8859-15 - - - - - - - - - - - - - - - - - - -
Windows-1252 82 83 84 85 86 87 88 89 8B 91 92 93 94 95 96 97 98 99 9B
Unicode 201A 192 201E 2026 2020 2021 2C6 2030 2039 2018 2019 201C 201D 2022 2013 2014 2DC 2122 203A

Individual evidence

  1. Microsoft Windows code page: 1252 (Latin I). Microsoft , archived from the original on May 8, 1999 ; accessed on September 27, 2019 .
  2. HTML 5.1 Nightly Editor's Draft February 19, 2013, 8.2.2.2 Character encodings , accessed February 19, 2013.
  3. iana.org
  4. Character encoding w3techs.com.
  5. Faq w3techs.com.