Linguistic search

from Wikipedia, the free encyclopedia

Linguistic search is a method used by search engines and means that the search query is processed using linguistic methods. For this purpose, further word variants are derived from the original query.

Procedure

The linguistic methods used are: lemmatization , i.e. the recognition of basic forms, the decomposition of compounds , the generation of word variants, the generation of synonyms and the word derivatives .

The approach differs from the stemming method, since the linguistic search works with actually existing word forms (and not with word parts or stems). This is particularly useful for German due to the strong irregularity of its word formation.

These different linguistic procedures are applied sequentially because they are interdependent. First, a lemmatization of the individual terms is carried out, which identifies the basic form of each term (houses → house, birds → bird, walked → walk). In German, compounds can be broken down into their components in this phase (Autobahn toll → Autobahn + toll, atomic energy debate → atomic energy + debate). The next step consists in generating all word variants from this found basic form (house → houses, houses, house, house; go → go, go, go, went, went, went, went, etc.).

The additional variants obtained in this way can be used to enrich the original search, which is why the term "expansion" is also used.

Linguistics provides further methods: The recognition of synonyms (house → building, dynasty etc.) or word derivatives (house → domestic, little house etc.) provides additional variants for the search.

The search terms from the different source or origin languages ​​can be translated into different target languages (aircraft → (English) airliner, airplane, plane, aircraft → (French) avion) ​​and inserted into the search like synonyms. This enables a multilingual search to be carried out, in the sense of a cross lingual information retrieval , which means that the search query in one language triggers a search in one or more other languages.

Goal setting

In contrast to automatic translation , where a correct translation of a term has to be found, the aim here is to bring as many translation equivalents as possible, whereby the context in the results found implicitly differentiates the meanings ( disambiguation ).

A growing number of linguistic systems have been developed up to now, with partly very varying objectives. The fundamental differences concern the size of the dictionaries used (several million entries) and linguistic resources, the equipping of the dictionaries with grammatical information ( morphological , syntactic , semantic ) and the availability of the translation dictionaries with regard to the language pairs.

Systems

While monolingual systems are relatively numerous (example: DWDS as a comprehensive dictionary system, FAST as a search and indexing system with a linguistic component, AUTINDEX), there are only a few multilingual ( cross-lingual ) systems.

The following are mentioned:

  • BASE , which uses the multilingual Eurovoc thesaurus for translation.
  • LEXIQUO and PSYDOK use the 'linguistic engine' EXTRAKT with translation dictionaries and Eurovoc data for German, English, French (as well as Italian and Spanish).
  • Pertimm a French-American multilingual indexing and retrieval system.

credentials

  1. AUTINDEX Automatic indexing and classification ( Memento of the original from October 22, 2010 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.iai-sb.de
  2. Lexiquo portal
  3. Multilingual search in PSYDOK
  4. 'linguistic engine' EXTRACT ( Memento of the original from December 17, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.textec.de
  5. Article about EXTRACT in www.ifra.net
  6. Pertimm System ( Memento of the original from July 2, 2010 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.pertimm.com