Computational linguistics

from Wikipedia, the free encyclopedia

In computational linguistics ( CL ) or linguistic data processing ( LDV ) it is investigated how natural language in the form of text or language data can be algorithmically processed with the help of the computer . It is the interface between linguistics and computer science . In the English literature and computer science, the term Natural language processing (NLP) in use .

history

Computational linguistics as a term (or its paraphrase) can be traced back to the 1960s. With the beginnings of artificial intelligence , the task was already obvious. Chomsky's Syntactic Structures from 1957 presented the language in a suitably new formal framework. In addition, there was the language logic of Saul Kripke and Richard Montague . The research, some of which was very heavily funded from the US defense budget, did not produce the breakthroughs hoped for. The luminaries Chomsky and Weizenbaum in particular dampened expectations of the automation of language translation. The shift from behavioristic to mentalistic (Chomsky) concepts of science was followed by extensive concepts in the cognitive sciences .

In the 1970s, publications with the term computational linguistics in the title appeared more and more frequently . There have already been financially expensive attempts at exegetical applications (concordances, word and form statistics), but also larger projects on machine language analysis and translations. The first computational linguistics courses in Germany were set up at the Saarland University and in Stuttgart. Computational linguistics gained new areas of application with the spread of workstation computers (PC) and the advent of the Internet. In contrast to Internet linguistics, which particularly examines human language behavior and the language formations induced by it in and by means of the Internet, computational linguistics has developed a more informative-practical orientation. However, the subject did not completely give up the classic philosophical-linguistic questions and is now divided into theoretical and practical computational linguistics.

Present task in computational linguistics

“Computational linguistics researches the machine processing of natural languages. It develops the theoretical basics of the representation, recognition and generation of spoken and written language by machines. "

- University of Munich

The Saarbrücken pipeline model

Computers see language either in the form of sound information (if the language is acoustical) or in the form of strings of letters (if the language is in written form). In order to analyze the language, one works step-by-step from this initial representation in the direction of meaning and goes through various linguistic levels of representation. In practical systems, these steps are typically carried out sequentially, which is why one speaks of the pipeline model, with the following steps:

voice recognition
If the text is available as sound information, it must first be converted into text form.
Tokenization
The chain of letters is segmented into words, sentences, etc.
Morphological analysis
Personnel forms or case markers are analyzed to extract the grammatical information and to trace the words in the text back to basic forms, as e.g. B. in the lexicon.
Syntactic Analysis
The words of each sentence are analyzed for their structural function in the sentence (e.g. subject, object, modifier, article, etc.).
Semantic analysis
The sentences or their parts are assigned meaning. This step potentially includes a multitude of different individual steps, as meaning is difficult to grasp.
Dialogue and discourse analysis
The relationships between successive sentences are recognized. In dialogue this could e.g. B. be a question and answer, a statement in the discourse and its justification or its limitation.

However, it is not the case that all computational linguistics processes run through this entire chain. The increasing use of machine learning processes has led to the insight that statistical regularities exist on each of the levels of analysis that can be used to model linguistic phenomena. For example, many current machine translation models use syntax only to a limited extent and semantics hardly at all; instead, they limit themselves to exploiting patterns of correspondence at the word level.

At the other end of the scale are procedures that work on the principle of semantics first, syntax second . The cognitively oriented language processing based on the MultiNet paradigm is based on a semantics-based computer lexicon that is based on an essentially language-independent semantic core with language-specific morphosyntactic additions. This lexicon is used during parsing by a word class-controlled analysis for the direct generation of semantic structures.

Examples of language processing problems

  • Resolution of syntactic ambiguities . In some cases, a sentence can be analyzed and interpreted in several ways. Choosing the right one sometimes requires semantic information about the speech act and the intention of the speaker, but at least statistical prior knowledge about the common occurrence of words. Example: "Peter saw Maria with binoculars" - here it is not necessarily clear whether Peter saw Maria who was holding binoculars in her hand, or whether Peter could see Maria with the help of binoculars.
  • Determine the semantics . The same word form can have a different meaning depending on the context (compare homonym , polyseme ). One has to choose the meaning that applies to the context. On the other hand, formalisms are needed to represent word meanings.
  • Recognizing the intention of a linguistic utterance (see pragmatics ). Some sentences are not meant literally. For example, to the question “Can you tell me what time it is?” One does not expect an answer like “Yes” or “No”, but rather asks for information about the time.

Applications in practice

Practical Computational Linguistics is a term that has established itself in the courses offered by some universities. Such training courses are close to specific job descriptions for the IT maintenance and development of language processing machines and their programs . These include, for example:

Institutions

courses

Computational linguistics is offered as an independent course at several universities in German-speaking countries. Computational linguistics is classified as a minor subject in German university policy . Bachelor and Master degrees are possible. The best-known offers include the courses at Bielefeld University, the Ruprecht-Karls-University of Heidelberg, the Ludwig-Maximilians-University of Munich, the University of Potsdam, the University of Saarland and the University of Trier.

Meetings

  • Annual conference of the "Association of Computational Linguistics (ACL)"
  • "COLING": international conference that has taken place every two years since 1965
  • “Recent Advances in Computational Linguistics (RANLP)” emerged from a summer school, since 2001 every two years
  • The "International Joint Conference on Natural Language Processing (IJCLP)" has been taking place at irregular intervals since 2004 in Asia
  • Annual " Student Conference Linguistics (StuTS)" - three to four day conference by students for students;
  • "Conference of Computational Linguistics Students (TaCoS)" of German-speaking universities, which has been held annually at a different university since 1992;
  • Every two years the annual conference of the "Society for Linguistic Data Processing (GLDV)" or (since 2008) "Society for Language Technology and Computational Linguistics (GSCL)".
  • “KONVENS - Conference on the Processing of Natural Language”: a conference that has taken place every two years since 1992, organized alternately by the ÖGAI, DGfS-CL and GSCL companies

Organizations

See also

literature

  • James Allen: Natural Language Understanding . The Benjamin / Cummings Publishing Company, Redwood City, CA 1995, ISBN 0-8053-0334-0 .
  • Kai-Uwe Carstensen, Christian Ebert, Cornelia Ebert, Susanne Jekat, Ralf Klabunde, Hagen Langer (eds.): Computational Linguistics and Language Technology. 3. Edition. Spektrum Akademischer Verlag, Heidelberg 2010, ISBN 978-3-8274-2023-7 .
  • Roland Hausser: Foundations of Computational Linguistics: Human-Computer Communication in Natural Language. 3. Edition. Springer, 2014, ISBN 978-3-642-41430-5 .
  • Nitin Indurkhya, Fred J. Damerau: Handbook of Natural Language Processing. 2nd Edition. Chapman and Hall / CRC, 2010, ISBN 978-1-4200-8592-1 .
  • Daniel Jurafsky, James H. Martin: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. 2nd Edition. Prentice Hall, Upper Saddle River, New Jersey 2008, ISBN 978-0-13-187321-6 .
  • Henning Lobin: Computational Linguistics and Text Technology. Fink, Paderborn / Munich 2010, ISBN 978-3-8252-3282-5 .
  • Christopher D. Manning, Hinrich Schütze: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge / MA 1999, ISBN 0-262-13360-1 .
  • Ruslan Mitkov (Ed.): The Oxford Handbook of Computational Linguistics. Oxford University Press, 2003, ISBN 0-19-823882-7 .

Web links

Wiktionary: Computational Linguistics  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. ^ I. Bátori, J. Krause, HD Lutz (ed.): Linguistic data processing. Attempt to determine one's position in the field of information linguistics and artificial intelligence. Niemeyer Verlag, Tübingen 1982.
  2. David Crystal commented on this several times in the media and in essays in the mid-1960s. In England, the Alan Turing tradition had also been virulent since the 1930s .
  3. CIS COMPUTER LINGUISTICS. (PDF) Center for Information and Language Mediation, Ludwig Maximilians University Munich, accessed on November 10, 2015 .
  4. Hans Uszkoreit: VL Introduction to Computational Linguistics, Representations and Processes in Language Processing.
  5. Peter Kolb: What is statistical machine translation? ( Memento of the original from March 4, 2011 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.ling.uni-potsdam.de
  6. ^ Hermann Helbig : Knowledge Representation and the Semantics of Natural Language . Springer, Berlin 2006, ISBN 978-3-540-24461-5 .
  7. Small Subjects: Computational Linguistics on the Small Subjects portal. Retrieved April 23, 2019 .
  8. StudiScan: Master's degree in Computational Linguistics - 17 Master’s degree programs. Accessed January 31, 2019 .
  9. ^ ACL 2018: 56th Annual Meeting of the Association for Computational Linguistics. Retrieved January 30, 2019 .
  10. 27th International Conference on Computational Linguistics (COLING 2018). Retrieved January 30, 2019 (American English).
  11. ^ Department of Linguistic Modeling and Knowledge Processing: Events. Accessed January 30, 2019 .
  12. IJCNLP: Introduction ( Memento of 15 July 2013, Internet Archive )
  13. ^ Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing 2019. In: emnlp-ijcnlp2019.org. Retrieved February 19, 2019 .
  14. ^ Austrian Society for Artificial Intelligence (ÖGAI). Accessed January 30, 2019 .
  15. ^ Austrian Society for Artificial Intelligence (ÖGAI). Accessed January 30, 2019 .