Lexical density

The lexical density is a measure in linguistics , especially in computational linguistics , that indicates the percentage of content words in the total number of words. The term is derived from the English expression for content words, lexical words . Content words are those words that have their own lexical meaning. Opposite them are the functional words , which mainly have a grammatical meaning.

The lexical density can be calculated using the following formula: ${\ displaystyle LD = {\ frac {N_ {lexical}} {N_ {total}}} \ cdot 100}$

Scaling to values between 0 and 100 is not necessary and is not always done, especially if the lexical words are not related to the total number of words, but to the number of grammatical units, such as partial clauses . It is also possible to weight the lexical words depending on their frequency in the language.

The measure was introduced by Jean Ure to describe register variation. Even Michael Halliday noted that the lexical density in spoken is lower than in written language. The lexical density can be used for text analysis in forensic linguistics (including plagiarism detection ).

literature

Jean Ure: Lexical density and register differentiation . In: G. Perren, JLM Trim (Ed.): Applications of Linguistics . Cambridge University Press, London 1971, pp. 443-452 .
Michael AK Halliday: On Grammar . Continuum, 2005, ISBN 0-8264-8822-6 ( limited preview in Google Book Search).
John Olsson: Forensic Linguistics: An Introduction to Language, Crime, and the Law . Continuum, 2004, ISBN 0-8264-6109-3 ( limited preview in Google Book Search).