Computational linguistics
In computational linguistics ( CL ) or linguistic data processing ( LDV ) it is investigated how natural language in the form of text or language data can be algorithmically processed with the help of the computer . It is the interface between linguistics and computer science . In the English literature and computer science, the term Natural language processing (NLP) in use .
history
Computational linguistics as a term (or its paraphrase) can be traced back to the 1960s. With the beginnings of artificial intelligence , the task was already obvious. Chomsky's Syntactic Structures from 1957 presented the language in a suitably new formal framework. In addition, there was the language logic of Saul Kripke and Richard Montague . The research, some of which was very heavily funded from the US defense budget, did not produce the breakthroughs hoped for. The luminaries Chomsky and Weizenbaum in particular dampened expectations of the automation of language translation. The shift from behavioristic to mentalistic (Chomsky) concepts of science was followed by extensive concepts in the cognitive sciences .
In the 1970s, publications with the term computational linguistics in the title appeared more and more frequently . There have already been financially expensive attempts at exegetical applications (concordances, word and form statistics), but also larger projects on machine language analysis and translations. The first computational linguistics courses in Germany were set up at the Saarland University and in Stuttgart. Computational linguistics gained new areas of application with the spread of workstation computers (PC) and the advent of the Internet. In contrast to Internet linguistics, which particularly examines human language behavior and the language formations induced by it in and by means of the Internet, computational linguistics has developed a more informative-practical orientation. However, the subject did not completely give up the classic philosophical-linguistic questions and is now divided into theoretical and practical computational linguistics.
Present task in computational linguistics
“Computational linguistics researches the machine processing of natural languages. It develops the theoretical basics of the representation, recognition and generation of spoken and written language by machines. "
The Saarbrücken pipeline model
Computers see language either in the form of sound information (if the language is acoustical) or in the form of strings of letters (if the language is in written form). In order to analyze the language, one works step-by-step from this initial representation in the direction of meaning and goes through various linguistic levels of representation. In practical systems, these steps are typically carried out sequentially, which is why one speaks of the pipeline model, with the following steps:
- voice recognition
- If the text is available as sound information, it must first be converted into text form.
- Tokenization
- The chain of letters is segmented into words, sentences, etc.
- Morphological analysis
- Personnel forms or case markers are analyzed to extract the grammatical information and to trace the words in the text back to basic forms, as e.g. B. in the lexicon.
- Syntactic Analysis
- The words of each sentence are analyzed for their structural function in the sentence (e.g. subject, object, modifier, article, etc.).
- Semantic analysis
- The sentences or their parts are assigned meaning. This step potentially includes a multitude of different individual steps, as meaning is difficult to grasp.
- Dialogue and discourse analysis
- The relationships between successive sentences are recognized. In dialogue this could e.g. B. be a question and answer, a statement in the discourse and its justification or its limitation.
However, it is not the case that all computational linguistics processes run through this entire chain. The increasing use of machine learning processes has led to the insight that statistical regularities exist on each of the levels of analysis that can be used to model linguistic phenomena. For example, many current machine translation models use syntax only to a limited extent and semantics hardly at all; instead, they limit themselves to exploiting patterns of correspondence at the word level.
At the other end of the scale are procedures that work on the principle of semantics first, syntax second . The cognitively oriented language processing based on the MultiNet paradigm is based on a semantics-based computer lexicon that is based on an essentially language-independent semantic core with language-specific morphosyntactic additions. This lexicon is used during parsing by a word class-controlled analysis for the direct generation of semantic structures.
Examples of language processing problems
- Resolution of syntactic ambiguities . In some cases, a sentence can be analyzed and interpreted in several ways. Choosing the right one sometimes requires semantic information about the speech act and the intention of the speaker, but at least statistical prior knowledge about the common occurrence of words. Example: "Peter saw Maria with binoculars" - here it is not necessarily clear whether Peter saw Maria who was holding binoculars in her hand, or whether Peter could see Maria with the help of binoculars.
- Determine the semantics . The same word form can have a different meaning depending on the context (compare homonym , polyseme ). One has to choose the meaning that applies to the context. On the other hand, formalisms are needed to represent word meanings.
- Recognizing the intention of a linguistic utterance (see pragmatics ). Some sentences are not meant literally. For example, to the question “Can you tell me what time it is?” One does not expect an answer like “Yes” or “No”, but rather asks for information about the time.
Applications in practice
Practical Computational Linguistics is a term that has established itself in the courses offered by some universities. Such training courses are close to specific job descriptions for the IT maintenance and development of language processing machines and their programs . These include, for example:
- The support of the computer user with word processing , for example the automatic correction of typing and spelling errors, the checking of grammatical correctness or the conversion into meaning characters in Japanese or Chinese .
- Finding information in large amounts of linguistic data ( text mining , information extraction ), from the automatic search for relevant text passages ( information retrieval and search engines ) to the direct answer to questions ( Question Answering (QA)).
- Support for the translation of texts into another language ( computer-aided translation (CAT)) or even full automatic translation .
- The processing of spoken language ( speech recognition and speech synthesis ), for example in digital dictation machines or reading devices for the blind.
- The generation of natural language texts such as directions or weather forecasts.
- The preparation of linguistically available data, for example the automatic indexing of literature, the creation of registers and tables of contents , the creation of summaries and abstracts .
- Supporting authors in writing texts, for example finding the right expression or the right terminology , for example when using a controlled vocabulary in technical documentation .
- The linguistic interaction with a user in the context of a dialogue system , e.g. B. for telephone information services , but also for voice control of technical devices or computers.
- The automated measurement of personal strengths using natural conversations such as open interviews , job interviews , talk shows , panel discussions or group discussions .
Institutions
courses
Computational linguistics is offered as an independent course at several universities in German-speaking countries. Computational linguistics is classified as a minor subject in German university policy . Bachelor and Master degrees are possible. The best-known offers include the courses at Bielefeld University, the Ruprecht-Karls-University of Heidelberg, the Ludwig-Maximilians-University of Munich, the University of Potsdam, the University of Saarland and the University of Trier.
Meetings
- Annual conference of the "Association of Computational Linguistics (ACL)"
- "COLING": international conference that has taken place every two years since 1965
- “Recent Advances in Computational Linguistics (RANLP)” emerged from a summer school, since 2001 every two years
- The "International Joint Conference on Natural Language Processing (IJCLP)" has been taking place at irregular intervals since 2004 in Asia
- Annual " Student Conference Linguistics (StuTS)" - three to four day conference by students for students;
- "Conference of Computational Linguistics Students (TaCoS)" of German-speaking universities, which has been held annually at a different university since 1992;
- Every two years the annual conference of the "Society for Linguistic Data Processing (GLDV)" or (since 2008) "Society for Language Technology and Computational Linguistics (GSCL)".
- “KONVENS - Conference on the Processing of Natural Language”: a conference that has taken place every two years since 1992, organized alternately by the ÖGAI, DGfS-CL and GSCL companies
Organizations
- Association for Computational Linguistics (ACL)
- AFNLP (Asian Federation of Natural Language Processing Associations)
- German Society for Linguistics (DGfS) / Section Computational Linguistics
- Society for Language Technology and Computational Linguistics (GSCL), until 2008 "Society for Linguistic Data Processing (GLDV)"
- Austrian Society for Artificial Intelligence (ÖGAI) / Language Processing
See also
- Algebraic Linguistics
- Corpus Linguistics
- Lexical density
- Mathematical linguistics
- Quantitative Linguistics
- Quantitative literary studies
- Language statistics
literature
- James Allen: Natural Language Understanding . The Benjamin / Cummings Publishing Company, Redwood City, CA 1995, ISBN 0-8053-0334-0 .
- Kai-Uwe Carstensen, Christian Ebert, Cornelia Ebert, Susanne Jekat, Ralf Klabunde, Hagen Langer (eds.): Computational Linguistics and Language Technology. 3. Edition. Spektrum Akademischer Verlag, Heidelberg 2010, ISBN 978-3-8274-2023-7 .
- Roland Hausser: Foundations of Computational Linguistics: Human-Computer Communication in Natural Language. 3. Edition. Springer, 2014, ISBN 978-3-642-41430-5 .
- Nitin Indurkhya, Fred J. Damerau: Handbook of Natural Language Processing. 2nd Edition. Chapman and Hall / CRC, 2010, ISBN 978-1-4200-8592-1 .
- Daniel Jurafsky, James H. Martin: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. 2nd Edition. Prentice Hall, Upper Saddle River, New Jersey 2008, ISBN 978-0-13-187321-6 .
- Henning Lobin: Computational Linguistics and Text Technology. Fink, Paderborn / Munich 2010, ISBN 978-3-8252-3282-5 .
- Christopher D. Manning, Hinrich Schütze: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge / MA 1999, ISBN 0-262-13360-1 .
- Ruslan Mitkov (Ed.): The Oxford Handbook of Computational Linguistics. Oxford University Press, 2003, ISBN 0-19-823882-7 .
Web links
- Teaching materials of the Computational Linguistics Section of the DGfS
- Glossary of technical terms from computational linguistics
- Association for Computational Linguistics Wiki
- Uni Stuttgart: Introduction to Computational Linguistics
- Study Bibliography Computational Linguistics and Language Technology
- German-language portal for computational linguistics
- Lenhart Schubert: Entry in Edward N. Zalta (Ed.): Stanford Encyclopedia of Philosophy . (English)
Individual evidence
- ^ I. Bátori, J. Krause, HD Lutz (ed.): Linguistic data processing. Attempt to determine one's position in the field of information linguistics and artificial intelligence. Niemeyer Verlag, Tübingen 1982.
- ↑ David Crystal commented on this several times in the media and in essays in the mid-1960s. In England, the Alan Turing tradition had also been virulent since the 1930s .
- ↑ CIS COMPUTER LINGUISTICS. (PDF) Center for Information and Language Mediation, Ludwig Maximilians University Munich, accessed on November 10, 2015 .
- ↑ Hans Uszkoreit: VL Introduction to Computational Linguistics, Representations and Processes in Language Processing.
- ↑ Peter Kolb: What is statistical machine translation? ( Memento of the original from March 4, 2011 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.
- ^ Hermann Helbig : Knowledge Representation and the Semantics of Natural Language . Springer, Berlin 2006, ISBN 978-3-540-24461-5 .
- ↑ Small Subjects: Computational Linguistics on the Small Subjects portal. Retrieved April 23, 2019 .
- ↑ StudiScan: Master's degree in Computational Linguistics - 17 Master’s degree programs. Accessed January 31, 2019 .
- ^ ACL 2018: 56th Annual Meeting of the Association for Computational Linguistics. Retrieved January 30, 2019 .
- ↑ 27th International Conference on Computational Linguistics (COLING 2018). Retrieved January 30, 2019 (American English).
- ^ Department of Linguistic Modeling and Knowledge Processing: Events. Accessed January 30, 2019 .
- ↑ IJCNLP: Introduction ( Memento of 15 July 2013, Internet Archive )
- ^ Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing 2019. In: emnlp-ijcnlp2019.org. Retrieved February 19, 2019 .
- ^ Austrian Society for Artificial Intelligence (ÖGAI). Accessed January 30, 2019 .
- ^ Austrian Society for Artificial Intelligence (ÖGAI). Accessed January 30, 2019 .