Alphabetical sorting

from Wikipedia, the free encyclopedia

The alphabetical sorting is a sorting according to which strings are arranged according to the order of the letters in the alphabet . Conventional sorting is also known as initial alphabetical sorting , as the order of the individual letters is determined in the direction of the writing.

While the generally recognized sequence is usually used for the letters of the alphabet, there are different rules and standards for sorting special characters , such as special characters , diacritical marks , spaces , upper and lower case, hyphens and digits.

Basic principle

In order to decide which of two given character strings comes first in the (initial) alphabetical sorting, the character strings are compared character by character, starting with the first character. The first character position, at which the two character strings differ, decides the order: The character string whose characters are at this position further in the alphabet comes first. For example, "comes e lectric" before " f inished" ( e before f ) and "Drive r ad" in front of "Driving s tuhl" ( r before s ). If one string is shorter than the other and the same as the beginning of the other, this rule cannot be applied. Then the shorter string is usually sorted first. For example, “bicycle” comes before “bicycle chain”.

There are different rules for dealing with punctuation marks, special characters and upper and lower case letters; see section Sorting Rules by Language .

The alphabetical sorting is the template for the mathematical concept of the lexicographical order . Conversely, the alphabetical sorting itself is a lexicographical order, with the order of the letters in the alphabet as the underlying linear order .

history

Marcus Verrius Flaccus (* around 10 BC) was the first to arrange a Latin dictionary alphabetically. The Suda from the 2nd half of the 10th century is the first alphabetically arranged Byzantine encyclopedia . The 13th century Liber de proprietatibus rerum by Bartholomaeus Anglicus is also in alphabetical order and is often viewed as a forerunner of the encyclopedia . The principle of arranging the characters in a certain order is over three thousand years old; see Ugaritic script and general history of the alphabet .

Sorting rules by language

German language

Sorting rules for further letters

The German alphabet supplements the modern Latin alphabet with the umlauts Ä, Ö and Ü and the letter ß . These additional letters can be sorted in four ways:

  1. Ignoring the umlaut points . Waste as Mull order.
  2. Equal order of basic letters, double letters and umlaut if the double letter is spoken like umlaut. Mull as trash or garbage sorted. Duel, however, between Duden and Dugast .
  3. Resolution of the umlauts. Garbage as garbage before sleeve sorted.
  4. Separation as a separate letter.
    1. Classification behind the basic letter. Garbage stands between mucin and coin (and myalgia ).
    2. Classification at the end of the alphabet. Trash is behind myth .

For all other (foreign-language) diacritical marks in German-speaking countries, they are uniformly omitted; so are all accents, tilde, macron : é and e , ç and c , ñ and n , č and c , ō and o are the same.

Germany

Logo of the German Institute for Standardization DIN 31638
Area Correspondence
title Bibliographic rules of order
Latest edition 8.1994
ISO -
Logo of the German Institute for Standardization DIN 5007
Area Correspondence
title Order of character strings
Brief description: Part 1: ABC rules, Part 2: Heading rules
Latest edition 8.2005, 5.1996
ISO -

The German standard DIN 5007-1 describes the sorting under the title "Ordering character strings (ABC rules)".

DIN 5007 variant 1 (used for words, e.g. in lexicons; Section 6.1.1.4.1)

  • ä and a are the same
  • ö and o are the same
  • ü and u are the same
  • ß and ss are the same

DIN 5007 variant 2 (special sorting for lists of names, e.g. in telephone books; Section 6.1.1.4.2)

  • ä and ae are the same
  • ö and oe are the same
  • ü and ue are the same
  • ß and ss are the same

This takes into account that different spellings are possible for proper names, while terms in a lexicon or dictionary can only be entered under exactly one spelling. On the other hand, it cannot be determined whether someone is called Moeller or Möller . This is especially true for German-speaking individuals, institutions and place names.

Personal names are often sorted alphabetically in Germany (e.g. in telephone books) in the following way:

  • First, the entries are sorted by surname , whereby academic degrees such as “Prof.”, “Dr.” and name additions such as “from”, “before”, “on”, “to” are omitted. It should be noted that additions to names can also consist of several words, such as "von der Lippe".
  • If the surname is identical, the system will then sort alphabetically according to any name additions, whereby personal names without name additions are always listed first.
  • If the name suffixes also match (or there are none), the last name is sorted alphabetically based on the first name.

This type of sorting is regulated in the bibliographical classification rules DIN 31638.

Austria

Austrian sorting (for telephone books)

  • ä follows a (therefore comes after az )
  • ö follows o
  • ü follows u
  • ß follows ss
  • St. follows Santa

In the printed Austrian telephone book there are different sorting options: In the place directory umlauts and ß are sorted like their own letters at the end of the alphabet. The information pages and yellow pages are sorted according to DIN 5007 variant 1. The Austrian sorting is used in the name directory.

In libraries, sch often follows s , i.e. only after sz .

Example for German language sorting

DIN 5007 Var.1
(Lexicon)
DIN 5007 Var.2
(telephone book)
Austrian
sorting
...
Göbel
Goethe
Goldmann
Göthe
Götz
...
...
Göbel
Goethe
Göthe
Götz
Goldmann
...
...
Goethe
Goldmann
Göbel
Göthe
Götz
...

In variant 2, Goethe's two spellings are immediately adjacent, only distinguished from one another by first names. Johann Wolfgang von Goethe used both variants during his lifetime; the family was previously called Göthé . Today's uniform spelling was only introduced by Germanists more than a quarter of a century after his death.

Danish and Norwegian language

  • æ comes after z
  • ø comes after æ
  • å comes after ø

Finnish and Swedish language

  • å comes after z
  • ä comes after å
  • ö comes after ä
  • ü and y are the same
  • Until 2006, w and v were the same for foreign words and names (e.g. Verdi after Wagner). Since 2006 w comes after v .

Icelandic language

  • ð comes after d
  • þ comes after z
  • æ comes after þ
  • ö comes after æ
  • Acute discrites always follow their respective basic signs
  • á and å are the same

Estonian language

  • š comes after s
  • z comes after š
  • ž comes after z
  • õ comes after u
  • å comes after õ
  • ä comes after å
  • ö comes after ä
  • ü comes after ö
  • Until 2006, w and v were the same for foreign words and names (e.g. Verdi after Wagner). w comes after v since 2006.

Albanian language

The Albanian alphabet consists of 36 letters, some of which are digraphs .

  • ç comes after c
  • ie comes after d
  • ë comes after e
  • gj comes after g
  • ll comes after l
  • nj comes after n
  • rr comes after r
  • sh comes after s
  • th comes after t
  • xh comes after x
  • zh comes after z

Other languages

In the case of other languages, alphabetical sorting is also subject to language-dependent additional rules that are caused by additional letters or special rules. In Spanish, for example, there is traditionally the letter Ch , which until 1994 was usually alphabetically different from a C , which posed problems for the computer algorithms for sorting. After n it follows ñ . Alphabetical sorting becomes even more critical in languages ​​such as Japanese or Chinese , which use a large number of characters and whose order in the font (i.e. their coding) does not correspond to the order of a common sorting there. In Chinese, for example, sorting according to the pinyin equivalent (in computer systems) or according to a system that is based on the base symbol and the number of clockwise strokes (in dictionaries).

Computer systems

Computer systems encode the stored character strings using a system-wide or application-specific standard code ( ASCII and its variants or additions, more rarely EBCDIC , nowadays more and more Unicode ) and, in the simplest case, order the characters (including digits, spaces, punctuation marks and special characters) according to the assigned numerical value Codes so that, for example, all Latin capital letters are placed before the lowercase "a". However, many programs employ a traditional sort that is culturally expected by users. There are options for influencing the sorting sequence through individual coding or parameterization. One possible algorithm that can be used here is the Unicode Collation Algorithm . The type of sorting is determined by specifying a so-called collation (from English collation , sorting sequence ) for operating system configurations and applications such as database systems .

Declining sort

The reverse sorting is an alphabetical sorting in which the words are read from back to front. This is the way of sorting when creating declining dictionaries . It can also be used in rhyming dictionaries .

See also

Web links