Cologne Phonetics

from Wikipedia, the free encyclopedia

The Cologne Phonetics (also Cologne method ) is a phonetic algorithm that assigns a sequence of digits to words according to their speech sound, the phonetic code. The aim of this procedure is to assign the same code to words that sound the same in order to implement a similarity search in search functions. This makes it possible, for example, to find entries such as “Meier” under other spellings such as “Maier”, “Mayer” or “Mayr” in a name list. Compared to the better-known Russell Soundex method, the Cologne phonetics are better adapted to the German language . It was published in 1969 by Hans Joachim Postel .

Basic rules

The Cologne phonetics map each letter of a word to a digit between “0” and “8”, whereby a maximum of one neighboring letter is used as context for the selection of the respective digit. Some rules apply specifically to the beginning of the word ( initial sound ). In this way, similar sounds are assigned the same code. For example, the two letters “W” and “V” are encoded with the number “3”. The phonetic code for "Wikipedia" is 3412. In contrast to the Soundex code, the length of the phonetic code is not limited according to the Cologne Phonetics.

Letter codes

Letter context code
A, E, I, J, O, U, Y 0
H -
B. 1
P not before H
D, T not before C, S, Z 2
F, V, W 3
P before H
G, K, Q 4th
C. Initially before A, H, K, L, O, Q, R, U, X
before A, H, K, O, Q, U, X except after S, Z
X not after C, K, Q 48
L. 5
M, N 6th
R. 7th
S, Z 8th
C. to S, Z
Initially except in front of A, H, K, L, O, Q, R, U, X
not in front of A, H, K, O, Q, U, X
D, T before C, S, Z
X to C, K, Q

The fact that for the letter “C” the rule “ S C” has priority over the rule “C H ” was taken into account by adding “except after S, Z” in line 10 of the table. Although this is not explicitly mentioned in the original publication, it can be deduced from the examples given there (e.g. for “Brezhnev” the code “17863” is given).

Lower case letters are coded in the same way, all other characters (e.g. hyphens ) are ignored. For the umlauts Ä, Ö, Ü and ß that are not taken into account in the conversion table , it is advisable to classify them with the vowels (code "0") or the group S, Z (code "8").

A word is converted in three steps:

  1. Coding from left to right in letters according to the conversion table.
  2. Remove all digits that appear next to each other.
  3. Remove all codes "0" except at the beginning.

example

The name Müller-Lüdenscheidt is coded as follows:

  1. Letter-wise coding: 60550750206880022
  2. Remove all digits that appear next to each other: 6050750206802
  3. Remove all codes "0": 65752682

It should be noted that the name Müller-Lüdenscheidt is treated as a single word through the hyphen. If "Heinz Classen" is coded with the usual implementation and the fact that it is 2 words is ignored, then 068586 results, where Z becomes 8 and C also becomes 8 and the second 8 is omitted. If it is treated as two words, then C becomes 4 and remains, so you get the correct coding "068 4586".

See also

literature

  • Hans Joachim Postel: The Cologne Phonetics. A method of identifying personal names based on gestalt analysis. In: IBM-Nachrichten , Volume 19, 1969, pp. 925–931.

Web links