Unicode block tags
The Unicode block tags (English: Tags, U + E0000 to U + E007F) contains so-called "language tags". These were introduced in Unicode 3.1 and were originally used to specify the text language, script and spelling according to RFC 4646 in plain text files, e.g. B. to be able to use short and long characters side by side in a text file . Language information is started with the introductory language tag and then the appropriate code is specified using the language tag characters. The language you specify affects all subsequent text. The closing language day ends the language specification.
Since Unicode Version 5.1 (from 2008) is not recommended for this use, it is considered outdated (Engl. Deprecated ).
With Unicode Version 8.0 the characters U + E0020 to U + E007E were allowed again, but for new, more general purposes than just to mark the language of a text.
From version 9.0 this option is used, and the tag characters U + E0020 to U + E007E now form a sequence of modifier characters which give certain emoji characters a special meaning. This sequence ends with the tag character U + E007F.
So far (up to Unicode 10.0) only one type of sequence has been defined: The character U + 1F3F4 (? WAVING BLACK FLAG) can be modified to a country or region flag using a tag sequence. The tag sequence codes the country or region based on the CLDR database.
Example: The CLDR code for "England" is GBENG ("GB" for Great Britain, followed by "ENG" for England). The flag of England can now be encoded as an emoji sequence: <U + 1F3F4> <U + E0067> <U + E0062> <U + E0065> <U + E006E> <U + E0067> <U + E007F> results (if the program already supports such sequences): ???????. The other two sub-national flags with broad software support are ??????? Scotland and ??????? Wales. The fourth part of the United Kingdom, ??????? Northern Ireland , on the other hand, has no flag and is therefore usually shown with the aforementioned black flag instead of the Red Hand Flag of Ulster used at sporting events .
Note: Since Unicode 6.0, it has been possible to display national flags using pairs of characters from the range U + 1F1E6 to U + 1F1FF, see Unicode block Additional enclosed alphanumeric characters
table
All characters have the general category “Formatting Characters ” and the bidirectional class “Neutral Boundary”.
Unicode number | Characters (400%) |
Official name | description |
---|---|---|---|
U + E0001 (917505) | <format> | LANGUAGE DAY | Introductory language day |
U + E0020 (917536) | <format> | TAG SPACE | Language tag spaces |
U + E0021 (917537) | <format> | DAY EXCLAMATION MARK | Language day exclamation mark |
U + E0022 (917538) | <format> | TAG QUOTATION MARK | Language day quotation marks |
U + E0023 (917539) | <format> | TAG NUMBER SIGN | Language day pound sign |
U + E0024 (917540) | <format> | DAY DOLLAR SIGN | Language day dollar sign |
U + E0025 (917541) | <format> | TAG PERCENT SIGN | Language day percentage sign |
U + E0026 (917542) | <format> | DAY AMPERSAND | Language day ampersands |
U + E0027 (917543) | <format> | DAY APOSTROPHE | Language day apostrophe |
U + E0028 (917544) | <format> | TAG LEFT PARENTHESIS | Language day left bracket |
U + E0029 (917545) | <format> | DAY RIGHT PARENTHESIS | Language day right bracket |
U + E002A (917546) | <format> | DAY ASTERISK | Language tag asterisk |
U + E002B (917547) | <format> | TAG PLUS SIGN | Language day plus sign |
U + E002C (917548) | <format> | DAY COMMA | Language tag comma |
U + E002D (917549) | <format> | DAY HYPHEN-MINUS | Language tag hyphen |
U + E002E (917550) | <format> | DAY FULL STOP | Language day point |
U + E002F (917551) | <format> | DAY SOLIDUS | Language tag slash |
U + E0030 (917552) | <format> | TAG DIGIT ZERO | Language day digit zero |
U + E0031 (917553) | <format> | TAG DIGIT ONE | Language day number one |
U + E0032 (917554) | <format> | TAG DIGIT TWO | Language day digit two |
U + E0033 (917555) | <format> | TAG DIGIT THREE | Language day digit three |
U + E0034 (917556) | <format> | TAG DIGIT FOUR | Language day digit four |
U + E0035 (917557) | <format> | TAG DIGIT FIVE | Language day digit five |
U + E0036 (917558) | <format> | DAY DIGIT SIX | Language day digit six |
U + E0037 (917559) | <format> | TAG DIGIT SEVEN | Language day digit seven |
U + E0038 (917560) | <format> | DAY DIGIT EIGHT | Language day digit eight |
U + E0039 (917561) | <format> | DAY DIGIT NINE | Language day digit nine |
U + E003A (917562) | <format> | DAY COLON | Language day colon |
U + E003B (917563) | <format> | DAY SEMICOLON | Language tag semicolon |
U + E003C (917564) | <format> | TAG LESS-THAN SIGN | Language day less than sign |
U + E003D (917565) | <format> | TAG EQUALS SIGN | Language day equal sign |
U + E003E (917566) | <format> | TAG GREATER-THAN SIGN | Language tag greater than sign |
U + E003F (917567) | <format> | DAY QUESTION MARK | Language day question mark |
U + E0040 (917568) | <format> | DAY COMMERCIAL AT | Speech day spider monkey |
U + E0041 (917569) | <format> | TAG LATIN CAPITAL LETTER A | Language Day Latin Capital Letter A |
U + E0042 (917570) | <format> | DAY LATIN CAPITAL LETTER B | Language Day Latin Capital Letter B |
U + E0043 (917571) | <format> | DAY LATIN CAPITAL LETTER C | Language Day Latin Capital Letter C |
U + E0044 (917572) | <format> | DAY LATIN CAPITAL LETTER D | Language Day Latin Capital Letter D. |
U + E0045 (917573) | <format> | TAG LATIN CAPITAL LETTER E | Language Day Latin Capital Letter E. |
U + E0046 (917574) | <format> | DAY LATIN CAPITAL LETTER F | Language Day Latin Capital Letter F |
U + E0047 (917575) | <format> | TAG LATIN CAPITAL LETTER G | Language Day Latin Capital Letter G |
U + E0048 (917576) | <format> | DAY LATIN CAPITAL LETTER H | Language Day Latin Capital Letter H |
U + E0049 (917577) | <format> | DAY LATIN CAPITAL LETTER I | Language Day Latin Capital Letter I. |
U + E004A (917578) | <format> | DAY LATIN CAPITAL LETTER J | Language Day Latin Capital Letter J |
U + E004B (917579) | <format> | TAG LATIN CAPITAL LETTER K | Language Day Latin Capital Letter K |
U + E004C (917580) | <format> | TAG LATIN CAPITAL LETTER L | Language Day Latin Capital Letter L |
U + E004D (917581) | <format> | DAY LATIN CAPITAL LETTER M | Language Day Latin Capital Letter M |
U + E004E (917582) | <format> | TAG LATIN CAPITAL LETTER N | Language Day Latin Capital Letter N |
U + E004F (917583) | <format> | DAY LATIN CAPITAL LETTER O | Language Day Latin Capital Letter O |
U + E0050 (917584) | <format> | DAY LATIN CAPITAL LETTER P | Language Day Latin Capital Letter P |
U + E0051 (917585) | <format> | TAG LATIN CAPITAL LETTER Q | Language Day Latin Capital Letter Q |
U + E0052 (917586) | <format> | TAG LATIN CAPITAL LETTER R | Language Day Latin Capital Letter R |
U + E0053 (917587) | <format> | TAG LATIN CAPITAL LETTER S | Language Day Latin Capital Letter S |
U + E0054 (917588) | <format> | TAG LATIN CAPITAL LETTER T | Language Day Latin Capital Letter T |
U + E0055 (917589) | <format> | TAG LATIN CAPITAL LETTER U | Language Day Latin Capital Letter U |
U + E0056 (917590) | <format> | DAY LATIN CAPITAL LETTER V | Language Day Latin Capital Letter V |
U + E0057 (917591) | <format> | TAG LATIN CAPITAL LETTER W | Language Day Latin Capital Letter W |
U + E0058 (917592) | <format> | DAY LATIN CAPITAL LETTER X | Language Day Latin Capital Letter X |
U + E0059 (917593) | <format> | TAG LATIN CAPITAL LETTER Y | Language Day Latin Capital Letter Y |
U + E005A (917594) | <format> | TAG LATIN CAPITAL LETTER Z | Language Day Latin Capital Letter Z |
U + E005B (917595) | <format> | TAG LEFT SQUARE BRACKET | Language tag square brackets on the left |
U + E005C (917596) | <format> | TAG REVERSE SOLIDUS | Language tag backslash |
U + E005D (917597) | <format> | TAG RIGHT SQUARE BRACKET | Language day square brackets on the right |
U + E005E (917598) | <format> | DAY CIRCUMFLEX ACCENT | Language day circumflex |
U + E005F (917599) | <format> | TAG LOW LINE | Language day underscore |
U + E0060 (917600) | <format> | TAG GRAVE ACCENT | Language tag gravis |
U + E0061 (917601) | <format> | TAG LATIN SMALL LETTER A | Language Day Latin Small Letter A |
U + E0062 (917602) | <format> | TAG LATIN SMALL LETTER B | Language Day Latin Small Letter B |
U + E0063 (917603) | <format> | TAG LATIN SMALL LETTER C | Language Day Latin Small Letter C |
U + E0064 (917604) | <format> | TAG LATIN SMALL LETTER D | Language Day Latin Small Letter D. |
U + E0065 (917605) | <format> | TAG LATIN SMALL LETTER E | Language Day Latin Small Letter E. |
U + E0066 (917606) | <format> | TAG LATIN SMALL LETTER F | Language Day Latin Small Letter F |
U + E0067 (917607) | <format> | TAG LATIN SMALL LETTER G | Language Day Latin Small Letter G |
U + E0068 (917608) | <format> | TAG LATIN SMALL LETTER H | Language Day Latin Small Letter H |
U + E0069 (917609) | <format> | DAY LATIN SMALL LETTER I | Language Day Latin Small Letter I |
U + E006A (917610) | <format> | TAG LATIN SMALL LETTER J | Language Day Latin Small Letter J |
U + E006B (917611) | <format> | TAG LATIN SMALL LETTER K | Language Day Latin Small Letter K |
U + E006C (917612) | <format> | TAG LATIN SMALL LETTER L | Language Day Latin Small Letter L |
U + E006D (917613) | <format> | TAG LATIN SMALL LETTER M | Language Day Latin Small Letter M |
U + E006E (917614) | <format> | TAG LATIN SMALL LETTER N | Language Day Latin Small Letter N |
U + E006F (917615) | <format> | TAG LATIN SMALL LETTER O | Language Day Latin Small Letter O |
U + E0070 (917616) | <format> | TAG LATIN SMALL LETTER P | Language Day Latin Small Letter P |
U + E0071 (917617) | <format> | TAG LATIN SMALL LETTER Q | Language Day Latin Small Letter Q |
U + E0072 (917618) | <format> | TAG LATIN SMALL LETTER R | Language Day Latin Small Letter R |
U + E0073 (917619) | <format> | TAG LATIN SMALL LETTER S | Language Day Latin Small Letter S |
U + E0074 (917620) | <format> | TAG LATIN SMALL LETTER T | Language Day Latin Small Letter T |
U + E0075 (917621) | <format> | TAG LATIN SMALL LETTER U | Language Day Latin Small Letter U |
U + E0076 (917622) | <format> | DAY LATIN SMALL LETTER V | Language Day Latin Small Letter V |
U + E0077 (917623) | <format> | TAG LATIN SMALL LETTER W | Language Day Latin Small Letter W |
U + E0078 (917624) | <format> | TAG LATIN SMALL LETTER X | Language Day Latin Small Letter X |
U + E0079 (917625) | <format> | TAG LATIN SMALL LETTER Y | Language Day Latin Small Letter Y |
U + E007A (917626) | <format> | TAG LATIN SMALL LETTER Z | Language day Latin small letter Z |
U + E007B (917627) | <format> | TAG LEFT CURLY BRACKET | Language tag left curly bracket |
U + E007C (917628) | <format> | TAG VERTICAL LINE | Language day vertical line |
U + E007D (917629) | <format> | TAG RIGHT CURLY BRACKET | Language tag right curly bracket |
U + E007E (917630) | <format> | TAG TILDE | Language day Tilde |
U + E007F (917631) | <format> | CANCEL DAY | Closing language day |
swell
- ↑ Unicode 5.1.0 properties
- ↑ Flag for Northern Ireland (GB-NIR) in Emojipedia (comparison of the representations in different Emoji sentences, but empty here)
Web links
- PDF of the Unicode Consortium (English; 77 kB)