Text generation

from Wikipedia, the free encyclopedia

When text generation (including natural language generation ; English Natural Language Generation , NLG ) is called the automatic production of natural language by a machine. As part of computational linguistics, the generation of texts is a special form of artificial intelligence .

Generation process

For the generation process, there are different description models and technical terms, depending on the method used and the perspective, without having to contradict each other in principle.

According to Ehud Reiter , the architecture for generation today consists of a text planner, a sentence planner and a user interface as standard. For the relationship between text segments, the theory of rhetorical structures, RST , is used to shape the discourse relations. A text is coherent if it can be represented by a tree of rhetorical relations and elementary text units (RST: Mann, Thompson): The following links apply as relations between main and subordinate clauses: CAUSE, RESULT, ELABORATION, CONTRAST, SEQUENCE, LIST, CONCESSION and others.

According to M. Hess, the generation requires two components.

  • The strategic component, what should be said: information selection, content selection, area planning. This component usually uses artificial intelligence search and planning strategies.
  • The tactical component, as it should be said: The planning of the linguistic form. A grammar tailored to the generation aspect is often used.

Ulrich Gaudenz Müller developed together with the Germanist and computational linguist Raimund Drewek from 1981 to 1999 a system for generating text, which was called SARA (sentence random generator).

Text generation from knowledge bases

“The prerequisite for any type of generation is that the information to be generated as text is available as formal information that can be processed by computer linguistics, such as B. Information from databases or knowledge representations. "

The generation of text from such knowledge bases is available in variants for different tasks.

  • Interface to expert systems
  • Production of technical documents in several languages ​​from a knowledge base
  • Automatic generation (of directions, weather reports and stock market reports)
  • Generation component of dialog systems

application areas

Robotic journalism

The term “robot journalism”, which is coined by the media, refers to algorithms that can generate finished message texts from databases and columns. In this process, the focus is on saving and focusing on humane journalists . By relieving the machine from the workload, editorial offices can, on the one hand, gain higher quality and more elaborately researched news products with fewer employees. On the other hand, they can publish reports that could not be written due to lack of time or insufficient interest. The use of software in journalism is still controversial; it is mainly discussed in what ways the human journalist is superior to software. In addition, the question of the extent to which automatically generated texts are subject to copyright law remains unanswered. The algorithms, which are specially tailored to the input data, continuously calculate values ​​and write reports on them, either at specific time intervals (e.g. daily weather reports) or when values ​​change significantly (e.g. earthquake warning). Particularly frequent areas of application for “robot journalists” are niches such as sub-class sports reports, weather reports and stock market tickers. But the data-driven creation of automated content for reporting on local topics is already in use.

Chatbots

In text-based dialog systems such as chatbots , text generation is used to communicate with the user. A well-known historical example is the ELIZA program .

Part of the communication with highly developed intelligent virtual agents is based on this principle, whereby the quality of the dialogue depends, among other things, on the link between the agent and knowledge bases. A person's dialogue with different interfaces can be facilitated if an agent generates text that productively answers questions:

  • When retrieving an information offer, among other things as a presentation agent for a website (also called "online moderator")
  • For a language-capable program for choosing a consultant (often used for pre-sorting customers by telephone)
  • In dialogues with characters in computer games

Text generation as a creative process

Text generation can be a component of creative processes in art and literature. For longer works, completely generated text bodies, whether generated meaningfully or provided with meaning through post-processing, do not offer any literary quality. However, some of the artistic processes of digital poetry important in the art of the previous century and in contemporary art are related to text generation.

Processes and applications in fine arts and literature

  • Interventions in the generating software or the knowledge base (artistic and literary experiments). Example (according to Reinhard Döhl ): Max Bense and his Stuttgart group used a Zuse Z22 in 1959 to "synthesize and output texts with the help of an entered lexicon and a number of syntactic rules".
  • Post-processing or integration of generated text by authors (literature).
  • Dialogue with the audience (for example in art installations ). Example: David Link, Poetry Machine

Text generation by phrase threshing machine

Phrase threshing machines or bullshit generators (English bullshit generators , also buzzword generators ) existed as mechanical devices before they were implemented in software. The first phrases threshing probably designed as a software was LoveLetters_1.0 programmed from 1952 Christopher Strachey at the University of Manchester for the Ferranti Mark I . Similar generators can be found in many more developed versions on the WWW.

Such programs work according to simple concepts that are used in more complex text generation processes: terms or parts of sentences are taken from lists, strung together and adjusted grammatically correct (grammatical realization). A method that is often used for this is generation with Markov chains . The result is a syntactically correct text that can have a meaningful effect, but is actually bullshit because phrase threshing machines do not access knowledge about the meaning of the particles used. For example, empty rhetoric from specialist literature can be jokingly satirized .

history

Apart from mechanical phrase threshing machines as precursors and apart from the earliest attempts to generate texts using software, the first phase of natural language generation begins with programs that schematically access knowledge that is already stored in text form to generate text. Starting in 1963, BASEBALL, an interface to the baseball data of the American baseball league, and SAD SAM, an interface for entering family relationships that already answered questions, worked. After several other works in this direction, ELIZA appeared in 1966, programmed by Joseph Weizenbaum . In the second phase, the knowledge is encoded in facts and rules: LUNAR, 1972, is the interface to the database on the lunar sample collection of the Apollo 11 mission. PARRY, 1975, simulates a paranoid talking to a psychiatrist. ROBOT, 1977, is the first commercial question-and-answer system. VIE-LANG, 1982, by Ernst Buchberger, is a dialogue system in German that generates sentences from a semantic network. HAM-ANS, 1983, by Wolfgang Hoeppner, is a dialogue system in German that simulates a hotel manager, for example.

literature

  • Ehud Reiter, Robert Dale: Building natural language generation systems . Cambridge University Press, Cambridge 2000, ISBN 0-521-62036-8 .
  • Helmut Horacek: Text generation in: Kai-Uwe Carstensen, Ralf Klabunde et al. (Ed.): Computational Linguistics and Language Technology . Heidelberg: Spektrum Akademischer Verlag, 3rd edition, 2010, ISBN 978-3827420237 , pp. 436-465
  • John Bateman: Applied natural language generation and information systems in: Ralf Klabunde et al. (Ed.): Computational Linguistics and Language Technology . see Heidelberg 2010 pp. 633–641
  • Rico Schwank: Analysis of concepts and methods for generating natural language texts from formal data . Diploma thesis. Otto von Guericke University Magdeburg, Faculty of Computer Science
  • Patrick Reichelt: Introduction to Robot Journalism: Threat or Opportunity ?. Tectum Wissenschaftsverlag, Baden-Baden 2017, ISBN 978-3828840591 .
  • Stefan Weber: Robot journalism, chatbots & Co .: How algorithms produce content and influence our thinking. Heise Medien, Hannover 2018, ISBN 978-3957881045 .

Web links

Individual evidence

  1. Ehud Reiter: Has a consensus NL generation architecture appeared, and is it psychologically plausible? in: Proceedings of the 7th. International Workshop on Natural Language generation (INLGW '94). (PDF) McDonald, D. and Meteer, M., 1994, pp. 163-170 , accessed on March 26, 2010 (English).
  2. KIT-MARKER project. Technische Universität Berlin, 1999, p. 1,3 , archived from the original ; Retrieved March 13, 2010 .
  3. Michael Hess: Introduction to Computational Linguistics (I). (PDF) (No longer available online.) University of Zurich, Institute for Computational Linguistics, 2005, pp. 44.4 f , archived from the original on March 31, 2007 ; Retrieved March 26, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / www.ifi.unizh.ch
  4. ^ A b c Wiebke Ramm and Claudia Villiger: Scientific text production and specialist domain . Linguistic realization of scientific content in various specialist disciplines and their computational linguistic modeling. In: Knorr, Dagmar / Jakobs, Eva-Maria (eds.): Text production in electronic environments . Text production and media Vol. 2. Lang Verlag, Frankfurt / Main 1997, ISBN 3-631-30970-8 , p. 214.2 ( rwth-aachen.de [PDF; accessed on March 15, 2010]).
  5. Susanne Göpferich, Dr. phil., Dipl.-Übers .: The technical editor as a global player: Professional practice and requirements for future training. Trade journal Technische Documentation 2000/05, December 19, 2003, p. 1,7 , accessed on March 14, 2010 : “A multilingual generation system that is equipped with the appropriate text type-specific texting rules can create these different types of text for the same product from a single one Generate knowledge base. "
  6. Stats Monkey. (No longer available online.) Intelligent Information Laboratory - Northwestern University, 2009, archived from the original on November 16, 2010 ; accessed on March 24, 2010 (English). Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / infolab.northwestern.edu
  7. http://www.text-gold.de/fundstuecke/roboterjournalismus-haben-und-schreiben/ , accessed on October 29, 2014
  8. Works created autonomously by computers: worthy of copyright protection? Retrieved November 8, 2018 .
  9. Julian Maitra: Media: The robot journalists are already among us. In: welt.de . May 15, 2014, accessed October 7, 2018 .
  10. ^ Andreas Graefe: Guide to Automated Journalism . Columbia Journalism Review, New York City 2016 (accessed February 14, 2018).
  11. Robot journalists save the local press. Who will save us from this? Accessed November 20, 2018 (German).
  12. Josef Karner: Mailüfterl, Al Chorezmi and Artificial Intelligence: A conversation with the computer pioneer Heinz Zemanek. Telepolis, August 8, 1999, p. 1 , archived from the original January 22, 2005 ; retrieved on March 20, 2010 (question 20 ff): "Weizenbaum did not create intelligence or even consciousness, but showed the simple means with which one can make a viewer believe that he is dealing with intelligence."
  13. ^ Roberto Simanowski: Automatic writing. XCULT, accessed on March 15, 2010 (presentation at the symposium Narrations in Media Art).
  14. Reinhard Doehl: The circle around Max Bense. Retrieved March 16, 2010 (Artificial Poetry Section 5).
  15. Reinhard Doehl: The circle around Max Bense. Retrieved March 16, 2010 (Artificial Poetry Section 6).
  16. Miriam Stürner: David Link, Poetry Machine (version 1.0), 2001-2002. (No longer available online.) ZKM, Center for Art and Media Karlsruhe, archived from the original on November 20, 2010 ; Retrieved March 15, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.zkm.de
  17. David Link: LoveLetters_1.0. MUC = Resurrection. A memorial. (No longer available online.) Archived from the original on March 28, 2010 ; Retrieved March 15, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.alpha60.de
  18. Andreas Stuhlmüller: texts with Markov. (No longer available online.) February 14, 2005, archived from the original on June 17, 2010 ; Retrieved March 24, 2010 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.aiplayground.org
  19. VIE-GEN. NLG Systems Wiki, November 17, 2009, accessed March 15, 2010 .
  20. Jörg Roth :: Introduction to natural language text generation. 1989, accessed March 14, 2010 .
  21. Rico Schwank: Analysis of methods for generating natural language texts from formal data. Otto von Guericke University Magdeburg, accessed on March 13, 2010 .