Text extraction

from Wikipedia, the free encyclopedia

The text extraction ( English text extraction also English keyphrase extraction ) or text extraction is a method for the automatic summary of a text with the help of computer linguistic techniques. Parts of a text - for example sentences or entire sections - are evaluated with regard to their importance or relevance using statistical and / or heuristic methods. These scores of importance serve as the basis for deciding which parts ("keyphrases") are extracted and combined into a shorter text, which then provides an overview of the content of the original text and is usually referred to as extract or abstract .

According to Karen Spärck Jones (1999), the summaries produced using this method have the disadvantage that they are mostly not very coherent and therefore difficult to read and in some cases even incomprehensible. On the other hand, this method and its variants are probably easier to model in automatic systems. Examples of this are the systems by Luhn (1959) and Edmundson (1969) and the approaches by Rath et al. (1961) and Brandow et al. (1995).

bibliography

  • Mani, I./Maybury, M. (1999): Advances in Automatic Text Summarization. Massachusetts Institute of Technology
  • Brandow, R./Mitze, K./Rau, LF (1995): Automatic condensation of electronic publications by sentence selection.
  • Rath, GJ / Resnick, A./Savage, TR (1961): The Formation of Abstracts by the Selection of Sentences.
  • Sparck Jones, K. (1999): Automatic Summarizing: Factors and Directions.
  • In: Mani / Maybury 1999, pp. 1–14 (introduction)