Apache Lucene

from Wikipedia, the free encyclopedia
Apache Lucene

Lucene logo green.svg
Basic data

developer Apache Software Foundation
Current  version 8.5.2
( May 26, 2020 )
operating system Platform independence
programming language Java
category Program library
License Apache license
German speaking No
lucene.apache.org

Apache Lucene is a program library for full-text search . Lucene is free software and a project of the Apache Software Foundation .

Lucene is used by Wikipedia (only directly, since 2014 via Elasticsearch ) . Twitter in particular provides an example of Lucene's performance and scalability .

history

Lucene was developed by Doug Cutting and was initially available through SourceForge since 1997 . The name Lucene is the middle name of Doug Cutting's wife.

In 2001 Lucene became part of the Jakarta Project and in 2005 a major project of the Apache Software Foundation. The Apache Lucene project occasionally gives rise to separately continued projects.

Projects based on Lucene

Lucene Core

The core of the project Lucene, Lucene Core or Lucene for short , formerly also called Lucene Java , is a program library that is written in the Java programming language .
On the one hand, Lucene creates an index of files that is about a quarter of the volume of the indexed files. On the other hand, Lucene then provides search results with a ranking list, for which several search algorithms are available.

Lucene.Net

Lucene.Net is a translation by Lucene into the programming language C # with adaptation of the programming interface to the .NET platform.

Lucy

Lucy is a port from Lucene to the C programming language for language connections to dynamic programming languages such as Perl .

PyLucene

PyLucene is an extension of Python to include a wrapper with a Java runtime environment for Lucene.

Droids

Droids is a framework for bots / crawlers . The Droids project ended on November 1st, 2015.

Solr

Solr is a Lucene based standalone implementation of a search server. Solr was originally developed by CNET and called Solar. The name was an abbreviation for Search on Lucene and Resin . The Solr download includes a configuration with Jetty as an example . Solr includes a REST-like API. Solr communicates using the Hypertext Transfer Protocol . Using HTTP POST, various file formats from XML to JSON to PDF can be recorded and documents can also be created. Queries are made using HTTP GET.

Tika

Tika used to belong to the Lucene project, is used by Solr and is a parser . It extracts metadata or structured text from a range of document formats using specialized (if possible, existing) libraries such as Apache PDFBox or Apache POI , which are uniformly addressed via Tika and can be selected automatically.

Nutch

Nutch used to be part of the Lucene project and is based on Solr .

Outside the project, other Lucene derivatives were created.

functionality

Lucene uses the Tf-idf measure and vector space retrieval to evaluate search hits.

literature

  • Manfred Hardt, Fabian Theis: Developing search engines with Apache Lucene. Developer. Press, 2004.
  • Erik Hatcher et al .: Lucene in Action. Manning, 2005 (about Lucene 1.4), 2nd ed. 2010 (about Lucene 3.0).
  • Florian Hopf: Flexible search with Lucene. In: Java aktuell. Issue 4-2013, p. 31 ff.

Web links

Individual evidence

  1. 26 May 2020 - Apache Lucene ™ 8.5.2 available . (accessed on July 5, 2020).
  2. Twitter Engineering: Twitter Search is Now 3x Faster . Twitter. April 6, 2011. Retrieved September 5, 2015.
  3. ^ Ten years of the Lucene search engine at Apache . Hot. September 27, 2011. Retrieved January 6, 2012.
  4. LuceneFAQ . Apache Software Foundation. Retrieved January 6, 2012.
  5. Apache Lucene Features . Apache Software Foundation. Retrieved January 6, 2012.
  6. Welcome to PyLucene . Apache Software Foundation. Retrieved January 6, 2012.
  7. Apache Droids Incubation Status - Apache Incubator. In: incubator.apache.org. Retrieved December 16, 2016 .
  8. Apache Solr -. Retrieved October 10, 2019 .
  9. FAQ - Solr Wiki . Apache Software Foundation. Retrieved January 6, 2012.
  10. Interview with Ian Holsman of Relegence (AOL) . Lucidworks. Retrieved August 31, 2015.
  11. Apache Solr Features. Retrieved October 10, 2019 .
  12. ^ Solr tutorial . Apache Software Foundation. Retrieved January 6, 2012.
  13. Lucene Implementations . Apache Software Foundation. Retrieved January 6, 2012.
  14. ^ Lucene's Practical Scoring Function. Elasticsearch: The Definitive Guide [2.x]. Elastic, accessed January 1, 2020 (American English).