Law of distribution of characters of various complexity

from Wikipedia, the free encyclopedia

The law of distribution of characters of different complexity states that characters of different degrees of complexity are used in texts according to a theoretically justifiable law of language.

Complexity of characters

If you look at written texts, it is clear that characters such as the <o> can be relatively simple or those like the <m> can be quite complex if, for example, the number of changes in direction in the course of the Character. However, the differences are clearer in the Chinese or Japanese script than in the Latin script. In these cases, the number of lines or components that make up the individual characters can be chosen as the criterion for complexity.

Law of distribution of characters of various complexity

The law of the distribution of characters of different complexity says that characters that consist of different amounts of strokes or components are not chaotic in texts, but are distributed according to law. In principle, this is the same law of language that quantitative linguistics developed especially for the frequency distribution of word lengths ( law of the distribution of word lengths ; theory: Wimmer et al.).

Distribution of the characters in Chinese

Chinese characters are organized hierarchically: they initially consist of one or more components, the components in turn from one or more individual lines. Studies on the distribution of characters of different complexity in Chinese have been carried out in the following ways:

  • the complexity of the characters was determined by the number of strokes (without taking the components into account); In this case, the 1-shifted binomial distribution could be adapted to 20 individual texts as a model with good results. The characters were combined in such a way that the first class consisted of x = 1 - 3, the second of x = 4 - 6 lines and so on. An example:
x n (x) NP (x)
1 36 31.83
2 79 87.75
3 105 96.75
4th 47 53.34
5 17th 14.70
6th 2 1.62

(Where x is the number of lines per character, n (x) is the number of characters with x lines observed in this text; NP (x) is the number of characters with x lines, which is calculated using the 1-shifted binomial Distribution adapts to the observed data Result: the 1-shifted binomial distribution is a good model for this text with the test criterion P = 0.34, where P is considered good if it is greater than / equal to 0.05. For more detailed explanations, please refer to the literature given.)

  • the complexity of the characters was determined by the number of their components; the components themselves are made up of different numbers of lines. For a set of over 5000 characters, the 1-shifted Dacey-Poisson distribution proved to be a suitable model.
  • the complexity of the components was determined by the number of their strokes. For a set of 500 components, the 1-shifted Poisson distribution turned out to be a suitable model.

Distribution of the complexity of written words in Japanese

In a study of Japanese, the complexity of the Kanji characters only played an indirect role. Rather, the word complexity was examined here, whereby words with x = 1 - 5 bars, x = 6 - 10 bars and so on were classified. Sanada worked out the example using a dictionary excerpt:

x n (x) NP (x)
1 4th 2.91
2 66 78.47
3 368 358.89
4th 594 580.85
5 438 449.87
6th 200 196.71
7th 55 53.91
8th 5 9.96
9 3 1.44

(Where x is the number of lines per word, n (x) is the number of words with x lines observed in this text; NP (x) is the number of words with x lines, which is calculated using the Conway-Maxwell Poisson distribution fits the observed data Result: the Conway-Maxwell-Poisson distribution is a good model for a vocabulary excerpt with the test criterion P = 0.28, where P is considered good if it is greater than or equal to 0.05 (For more detailed explanations, please refer to the literature given.)

A general law of language

These investigations, which are not yet very comprehensive, indicate that the same principles apply to characters of different complexity that are also valid for the well-researched word lengths and many other linguistic variables. If you edit the distribution of words of different complexity in their written form, as shown in the example of Japanese in the previous section, you arrive at a comparable result. See also:

literature

  • Gabriel Altmann : Script Complexity. In: Glottometrics 8, 2004, pages 68-74 (PDF full text ).
  • Gabriel Altmann, Fan Fengxiang (Ed.): Analyzes of Script. Properties of Characters and Writing Systems. Mouton de Gruyter, Berlin / New York 2008, ISBN 978-3-11-019641-2 . The contributions of the book give an overview of questions with which quantitative linguistics tries to grasp the writing systems, including several attempts to prove the laws of grapheme complexity / grapheme length. In this:
  • Gabriel Altmann: Towards a theory of script . Pages 149–164.
  • Carsten Peust: Script complexity revisited. In: Glottometrics 12, pages 11–15 (PDF full text ).

Individual evidence

  1. Gejza Wimmer, Gabriel Altmann: The Theory of Word Length Distribution: Some Results and Generalizations. In: Peter Schmidt (Ed.): Glottometrika 15. Issues in General Theory and the Theory of Word Length . Wissenschaftlicher Verlag Trier, Trier 1996, pages 112-133, ISBN 3-88476-228-1 ; Gejza Wimmer, Reinhard Köhler , Rüdiger Grotjahn & Gabriel Altmann: Towards a Theory of Word Length Distribution. In: Journal of Quantitative Linguistics 1, 1994, pp. 98-106
  2. Xiaoli Yu: On the complexity of Chinese characters. In: Göttinger Contributions to Linguistics 5, 2001, pages 121–129.
  3. Xiaoli Yu 2001, page 126. This is text number 12, from Binxin: Wangshi [2] .
  4. Hartmut Bohn: Quantitative Studies of the Modern Chinese Language and Writing. Publishing house Dr. Kovač, Hamburg 1998, page 55f. ISBN 3-86064-672-9 .
  5. Hartmut Bohn: Quantitative Studies of the Modern Chinese Language and Writing. Publishing house Dr. Kovač, Hamburg 1998, page 52f. ISBN 3-86064-672-9 .
  6. Haruko Sanada: Investigations in Japanese Historical Lexicology (Revised Edition) . Peust & Gutschmidt, Göttingen 2008, pages 99-101. ISBN 978-3-933043-12-2 .

Web links