Universal Coded Character Set

from Wikipedia, the free encyclopedia

The Universal Coded Character Set  ( UCS ) is a character encoding that is defined in the international standard ISO / IEC 10646 . This corresponds completely with the corresponding Unicode encodings  UTF-16 and  UTF-32 . Since the 2011 revision (ISO / IEC 10646: 2011), the encodings are identical in every respect with those of the respective Unicode standard.

The difference between z. B. UCS-2 and UTF-16 is the following:

  • UCS-2 is a fixed-byte coding : each character is exactly 2 bytes long. UCS-2 can thus display the 65,536 characters of the Basic Multilingual Plane .
  • UTF-16 is a flexible byte coding : a single character is 2 bytes long (for characters on the above-mentioned BMP) or 4 bytes long (for characters outside the above-mentioned BMP).

As a result, each UCS2 character has the same code point as the corresponding UTF-16 character, but not every UTF-16 character can be represented in UCS-2 (if it comprises four bytes in UTF-16).

The UCS is developed by ISO / IEC / JTC1 / SC2 / WG2. The group works very closely with the Unicode consortium , which constantly synchronizes the standards in new versions. Because of this, for reasons of interoperability , all encodings are limited to the 1,112,064 characters permitted by Unicode (= 2 20 +2 16 , minus 2 11  = 2048 surrogates of UTF-16), namely from U + 00000 to U + 0D7FF and from U + 0E000 to U + 10FFFF.

These two formats were originally defined:

In the version ISO / IEC 10646-3: 2003 the same formats UTF-8 , UTF-16 and UTF-32 were described as in Unicode 4.0.

Comparison of the versions

  • ISO / IEC 10646-1: 1993 ≈ Unicode 1.1
    • plus ISO / IEC 10646-1: 1993 / Amd 5: 1998 to ISO / IEC 10646-1: 1993 / Amd 7: 1997 ≈ Unicode 2.0 / 2.1
  • ISO / IEC 10646-1: 2000 ≈ Unicode 3.0
    • plus ISO / IEC 10646-2: 2001 ≈ Unicode 3.1
    • plus ISO / IEC 10646-1: 2000 / Amd 1: 2002 ≈ Unicode 3.2
  • ISO / IEC 10646-3: 2003 ≈ Unicode 4.0
    • plus ISO / IEC 10646: 2003 / Amd 1: 2005 ≈ Unicode 4.1
    • plus ISO / IEC 10646: 2003 / Amd 2: 2006 ≈ Unicode 5.0
    • plus ISO / IEC 10646: 2003 / Amd 3: 2008 and ISO / IEC 10646: 2003 / Amd 4: 2008 ≈ Unicode 5.1
    • plus ISO / IEC 10646: 2003 / Amd 5: 2008 and ISO / IEC 10646: 2003 / Amd 6: 2009 ≈ Unicode 5.2
  • ISO / IEC 10646: 2011 ≈ Unicode 6.0
  • ISO / IEC 10646: 2012 ≈ Unicode 6.1 / 6.2 / 6.3
    • plus ISO / IEC 10646: 2012 / Amd 1: 2013 and ISO / IEC 10646: 2012 / Amd 1 ≈ Unicode 7.0
  • ISO / IEC 10646: 2014 and ISO / IEC 10646: 2014 / Amd 1: 2015 ≈ Unicode 8.0
    • plus ISO / IEC 10646: 2014 / Amd 2: 2016 ≈ Unicode 9.0
  • ISO / IEC 10646: 2017 ≈ Unicode 10.0

Web links

Individual evidence

  1. a b The Unicode® Standard Version 10.0 - Core Specification: Appendix C Relationship to ISO / IEC 10646. The Unicode Consortium, pp. 907–908 , accessed on April 12, 2018 (English).