Semantic search

from Wikipedia, the free encyclopedia

The semantic search is a search method in which the meaning of a query (the Internet is provided or in a digital text archive) at the center.

By using background knowledge, the meaning of texts and search queries is taken into account in a semantic search engine. It is not only searched for words in the text, as with keyword- based search engines . In this way, a search query can be recorded more precisely and linked to the relevant texts. In this way, correct search results are provided. To a certain extent, semantic search mimics the human brain by using knowledge and associations to search.

background knowledge

The background knowledge - in the form of thesauri , semantic networks and ontologies - that is used in the semantic search maps the knowledge of a specific domain. Depending on the area of ​​application, concepts and relevant relationships between concepts are recorded. The mapping of concepts and their relationships enables procedures for specialization - that is, narrowing down search results - and generalization - generalizing - of a search query. Relationships themselves can be of a simple nature - in the form of "A-is-a-B"; However, they can also map more complex relationships - such as “A-knows-B” or “A-activated-B”. In computer science and especially in bioinformatics, the data formats OWL and RDF or RDFs have become established for storing background knowledge in ontologies.

In order to be able to create ontologies as efficiently as possible, Stanford University developed the Protégé tool and the University of California, Berkeley , OBO-Edit. In addition to these two tools, there are a number of other such software systems.

A current challenge is the automated creation of ontologies. Various approaches are being pursued, ranging from manual processing to semi-automatic processes. In the semi-automatic creation of an ontology, an automated process is run through that generates proposals for concepts and their networking, which then have to be assessed and approved by a domain expert.

Annotation between text and background knowledge

The annotation procedures represent an important aspect of the semantic search . The annotator links text data from documents or databases with relevant entities of the background knowledge, ie the ontology. Text mining processes are used for annotation in order to be able to read and classify content semantically correct. Today's highly trained algorithms achieve a combination of accuracy and completeness, the so-called F-measure , of over 90 percent. The F-measure is the key figure in which the accuracy and the hit rate are assessed equally. The technical success of a semantic search engine is also based on the annotator used .

Aspects of semantic search

The quality of the semantic search is mainly determined by two factors. The inclusion of synonyms in the search query is important for the completeness of the search results. All known synonyms of a term are stored in the background knowledge. If the user uses one of these search terms in a query, all related synonyms are also included in the search query. It is thus possible, for example, when searching for “programmer” to find those documents in which the qualification with the synonym “software developer” is recorded.

The distinction between homonyms (e.g. Jaguar (car brand) versus Jaguar (animal)) in the search results increases the quality of the search results found. The search results found and incorrectly assigned by means of disambiguation , the resolution of ambiguities, are automatically removed. Here u. a. statistical methods, text mining and natural language processing are used to recognize the context of a document and thus to be able to infer the correct or incorrect assignment of the subject area. In this case, if the context of the document in which the search query was found is correct, it is classified as the correct search result. Conversely, documents with the wrong context are excluded from the search results.

The third and most important aspect of semantic search is the use of existing background knowledge. If you search for a term such as B. heart disease, other relevant terms for the field such as For example, the coronary disease "angina pectoris" is also taken into account, since the concept of "near" heart diseases is reflected in the background knowledge. For example, the MeSH (medical subject headings) network with around 80,000 concepts allows this approach in the biomedical domain. The scientific biomedical search engine shows the possibilities of semantic search in this area.

Displaying the search results, which are usually much more extensive than a keyword search, in a user-friendly form is a difficult but solvable task.

See also

literature

  • Thomas Cloer: Google has integrated new technology into its Internet search, which should better "understand" what the user is really interested in. In: Computerwoche . March 5, 2009. [1]

Web links

Individual evidence

  1. www.GoPubMed.org ( Memento of the original from July 18, 2009 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.gopubmed.org