Nutch
Nutch
|
|
---|---|
Basic data
|
|
developer | Apache Software Foundation |
Current version |
2.4 ( October 11, 2019 ) |
operating system | Platform independence |
programming language | Java |
category | Crawler , parser and search engine . |
License | Apache license |
German speaking | No |
nutch.apache.org |
Nutch is a Java framework for Internet search engines . The software is open source and is developed within the Apache Software Foundation under the Apache license . Nutch based u. a. on Lucene ( stemming , indexing etc.), Solr (web functionalities) and Hadoop (scaling).
Nutch can search any large amount of data. It can be adapted to company-specific needs thanks to its plug-in architecture - e.g. to other document formats.
The German Federal Office for Consumer Protection and Food Safety operated the Nutch-based “consumer search engine” Clewwa . The Wikia Search search engine was also based on Nutch technology.
Nutch is currently being maintained in 2 versions
- 1.x: Is a ready-made crawler , which enables a very fine configuration and relies on the data structures of Apache Hadoop , it should be ideal for batch processing
- 2.x: Is offered as an alternative to version 1.x, the main difference is in the memory area, this has been abstracted and uses Apache Gora to link objects. This increased the flexibility of what (e.g. status, content, links, processed text ...) can be saved and how the storage e.g. B. takes place in NoSQL solutions.
Web links
- Official website (English)
- Wiki (English)
- Application examples
Individual evidence
- ↑ nutch.apache.org . (accessed on March 11, 2020).
- ↑ The nutch Open Source Project on Open Hub: Languages Page . In: Open Hub . (accessed October 18, 2018).
- ↑ Home - NUTCH - Apache Software Foundation. Retrieved March 11, 2020 .