Elasticsearch

from Wikipedia, the free encyclopedia
Elasticsearch

Elasticsearch logo.svg
Basic data

developer Elastic
Publishing year 2010
Current  version 7.8.1
( July 27, 2020 )
operating system platform independent
programming language Java
category Search server
License Various ("Open-Core" model): including Apache license ( open source ), Elastic License (commercial, "source-available")
www.elastic.co/elasticsearch
Elasticsearch configuration of the Wikipedia servers

Elasticsearch is a search engine based on Lucene . The program, written in Java , saves documents in a NoSQL format ( JSON ). Communication with clients takes place via a RESTful web interface. Elasticsearch is the most widely used search server alongside Solr . It enables simple operation in a network of computers to implement high availability and load distribution . Distribution by Elastic NV follows the “Open Core” model, which means that the core of the software is subject to open source licenses (mainly Apache License 2.0 ), other parts are subject to the commercial Elastic License .

Physical structure

Elasticsearch cut each index into several pieces, called shards (shards, fragments). The user can distribute the shards of an index to several servers ( nodes ) if necessary (the group is called cluster ) in order to distribute the processing load or to compensate for server failures. If the search engine runs on several nodes , one is determined as the master node of the group.

A shard is a Lucene index. Elasticsearch uses Apache Lucene for indexing and searching as the core library . An Elasticsearch index therefore consists of several Lucene indexes. A Lucene index consists of a folder with files that contain an inverted index .

functionality

Documents, types and indices

The smallest unit is used for the operation are documents ( English documents ). Every document that is to be searched must first be indexed . If, for example, information about books is to be made searchable, the information about each individual book could be written in a document which then has to be indexed. An index corresponds roughly to an SQL table, and a document corresponds to a row in this table. However, the number and types of fields are not necessarily rigidly specified, but can be explicitly typed by means of a type assignment ( English mapping ) if required .

In order to be indexed, the documents must be sent to Elasticsearch in JSON format. As JSON documents, they each consist of a set of pairs of keys and values. An example of a document intended to describe a book:

{
  "titel": "Granatenstarke Suchmaschinentechnologie",
  "autor": "Michael Käfer",
  "erscheinungsjahr": "1794",
  "verlag": "Müller-Verlag"
}

Search queries

The search queries ( English queries ) are sent as JSON documents to Elasticsearch. The example data set above would be found by the following search query, for example:

{
   "query": {
      "match": {
         "titel": "suchmaschinentechnologie"
      }
   }
}

The most important part of a search query is the query parameter (its content determines which documents are to be found in which order), other parameters are size (determines the maximum number of hits), from (serves to split long hit lists over several pages), _source (enables not entire documents to be received as results, but only certain fields of these documents) and sort (enables an alternative, user-definable sorting of the results).

A request ( query ) different parameters can be assigned. Some examples:

"match_all": { }                   // findet alle Dokumente
"match": {                         // findet alle Dokumente mit
   "titel": "granatenstarke"       // "granatenstarke" im Feld "titel"
}                                   //
"multi_match": {                   // findet alle Dokumente mit
   "query": "granatenstarke",       // "granatenstarke" im Feld "titel"
   "fields": ["titel","autor"]      // oder im Feld "autor"
}                                   //
"bool": {                           //
   "must": {                        // im "titel"-Feld muss "quick"
      "match": {"titel": "quick"}   // vorkommen
   },                               //
   "must_not": {                    // im "titel"-Feld darf "lazy"
      "match": {"titel": "lazy"}    // nicht vorkommen
   },                               //
   "should": {                      // im "titel"-Feld muss entweder
      "match": {"titel": "brown"},  // "brown" oder "green" vorkommen
      "match": {"titel": "green"}   //
   }                                //
}                                   //

Communication with the REST API

The REST API from Elasticsearch is used to communicate both for indexing the JSON files and for search queries . There are several ways to accomplish this communication. The most widespread and best-documented is to send documents and search queries to the REST API using the cURL program . There are also various programs available (such as Postman ) that are user-friendly and capable of this type of communication, but also communication via self-created scripts in the common programming languages. An example in which the above JSON document is sent with cURL via a terminal to the server running Elasticsearch:

curl -X PUT '78.47.143.252:9200/materialienzusuchmaschinen/buecher/1?pretty' -d '{
   "titel": "Granatenstarke Suchmaschinentechnologie",
   "autor": "Michael Käfer",
   "erscheinungsjahr": "1794",
   "verlag": "Müller Verlag"
}'

In this example, the index into which the document is loaded is given the name “materials search machines”, the type the name “books”. If an index and a type of this name do not already exist, they are automatically recreated. The sent document is stored under the ID "1" also specified here .

Indexing

If you send a document for indexing, Elasticsearch starts an analysis process ( English analysis ) while the document is being prepared for the index. The text of the document to be indexed is converted so that the results obtained from it can later be written into the index. First, the text is broken up into the individual words at defined places (such as spaces or commas ) (for example, "Grenade-strong search engine technology" into "Grenade-strong" and "Search engine technology"). The letters of each of these words are then completely converted to lowercase letters (for example, “Granatenstarke” into “Granatenstarke”). There are more steps to follow; It is also possible to incorporate your own conversion stages.

On the one hand, Elasticsearch saves the results of the analysis process (such as "high-powered ones") in the index; on the other hand, the original documents that were originally sent are also saved in a different location.

literature

  • Radu Gheorghe, Matthew Lee Hinman, Roy Russo: Elasticsearch in Action , Version 17. Manning, 2015, ISBN 978-1-61729-162-3
  • Clinton Gormley, Zachary Tong: Elasticsearch. The definitive guide . 1st edition. O'Reilly, 2015, ISBN 978-1-4493-5854-9

Individual evidence

  1. Release 7.8.1 . July 27, 2020 (accessed July 28, 2020).
  2. DB-Engines Ranking of Search Engines , as of April 2016
  3. Oliver B. Fischer: Full text search with ElasticSearch. In: heise Developer. July 26, 2013, accessed June 6, 2015 .
  4. Open Source, Distributed, RESTful Search Engine. Contribute to elastic / elasticsearch development by creating an account on GitHub. elastic, March 14, 2019, accessed March 14, 2019 .