YaCy

from Wikipedia, the free encyclopedia
YaCy

YaCy logo.png
Basic data

Publishing year 2004
Current  version 1.92
(Jan. 2017)
operating system cross-platform ( Java )
programming language Java
category Search engine , proxy
License GPL ( Free Software )
German speaking Yes
www.yacy.net/de

YaCy (from Yet another Cyberspace , homophonic to English ya see ) is a search engine that works on the peer-to-peer principle - P2P for short. There is no central server ; all participants are equal.

The installation of YaCy provides a local YaCy proxy . All websites accessed via this proxy, as well as other data using the plugins provided, are indexed locally and can be searched by the user using the YaCy web interface. This index is now (optionally) distributed redundantly to other peers of the global YaCy network, so that a global index is created. A global search queries the global index, which consists of the peers that are currently online. Thanks to this decentralized principle, YaCy is resistant to failures.

Your own index (and thus indirectly the global one) can be expanded by sending your own web crawler . Alternatively, you can configure your own YaCy-based networks to create a common index; An example is the Sciencenet.

The YaCy project was founded by Michael Christen in 2003.

advantages and disadvantages

advantages

  • The global search engine built with YaCy would be practically fail-safe, as part of the network would always be accessible.
  • With YaCy as a search engine, internet users are independent of companies, their ranking (which they may have paid for) and their censorship.
  • The software is open source , published under the GNU General Public License , and is free.
  • Since indexing takes place via the proxy on the respective client, pages from the deep web or non-public networks (e.g. i2p ) that a crawler of a public search engine such as B. Google cannot tap into it.
  • YaCy is not necessarily tied to participation in the public YaCy cluster. B. can be used as a search engine in private networks (e.g. company intranet) or as a private search engine for visited (and thus indexed) pages.

disadvantage

  • Since YaCy has to contact other peers for a search query and verify search results by reloading the hit page to avoid spam, the search takes longer than with conventional search engines.
  • If there are only a few peers, fewer results can be found than with large search engines. The failure or the shutdown of individual (large) peers can also lead to further impairments. With the release of version 1.0 at the end of November 2011, however, the number of peers rose to around 1000 due to the increasing level of awareness, so that this disadvantage can currently be neglected.
  • The YaCy protocol works via individual HTTP requests, which means that it has a higher latency than UDP or TCP with permanent connections.
  • The search queries are only temporarily stored in the RAM in the searched peer for the purpose of caching. The hash function used for coding the search words is primarily used to control the distributed hash table (DHT), and search words can be partially uncovered with a dictionary in order to show the search queries in plain text.
  • The data is not stored or transmitted in encrypted form.
  • In theory, spammers could operate their own peers that return spam as a result. However, incorrect search results are not possible because a peer verifies the hits by reloading the results pages before they are displayed.

The program

Unlike other search engines, the heart of the search engine is not a central page, but a computer program that runs on almost all operating systems. The search takes place via a local website that is supplied by the installed program. The results are displayed here as usual as an HTML page .

Coupled with the P2P system, there is an optionally usable proxy server that automatically indexes the pages visited. This does not take place for pages to which further data is transferred via GET or POST or which use cookies or HTTP authentication (e.g. pages in a login area). This ensures that only publicly accessible data is actually indexed.

More functions

  • YaCy offers all users of the proxy function the possibility to reach peers via the domain PEERNAME.yacy or PEERHASH.yacy. The user can store a homepage at www.PEERNAME.yacy, a file share is available at share.PEERNAME.yacy, and the normal interface can be reached at PEERNAME.yacy. The user can create additional subdomains himself by creating a folder with the subdomain name. So it's a kind of dynamic DNA .
  • Regardless of the .yacy domain, YaCy offers space for a homepage and a file share that can be linked to the current IP or a dynamic DNS name , even for users who do not use YaCy.
  • YaCy has a built-in messaging function with which you can send text messages (with Wikicode for formatting) and, depending on the recipient's settings, files as well.
  • YaCy has integrated a wiki and a blog .
  • There is a bookmark management in which public and private bookmarks can be created.
  • There is the possibility of creating defined blacklists for individual areas.
  • There is an OpenSearch interface. Each peer makes this available at http: // <peer-address>: <peer-port> /opensearchdescription.xml, for example http://search.yacy.net/opensearchdescription.xml

technology

The program is based on a web server , which is also a caching - proxy is. You can access the user interface via the web server to search or manage your own peer. The proxy shares its code with the crawler, which means that all pages visited that are not personalized are automatically recorded in the index. YaCy uses Apache Solr from version 1.04.9097 . The YaCy network also offers its own YaCy domains that are available via the proxy.

Index distribution

In contrast to file sharing networks, the result must be immediately available in a P2P search engine. To ensure this, YaCy uses a distributed hash table (DHT). This means that all recorded URLs and words are sent to the peers whose peer hash matches the corresponding word hash or url hash. With a search it works exactly the other way around: It is only searched for peers who can know their hash for URLs for the word.

This means that only a fraction of the peers have to be contacted during the search in order to still get good results.

Peer types

YaCy distinguishes between four different types of peers:

Virgin
These peers cannot be found because a virgin peer has no contact with the network. So you only see yourself if the peer is a virgin.
Junior
The peer is behind a firewall . Others can see him as a junior or a potential peer, but they can only see when he last contacted him and have no way of knowing whether he is still online.
Senior
A senior can be reached from the outside and is a full member of the YaCy network.
Principal
Like Senior, only a “seedlist” is uploaded that other peers can use for bootstrapping .

protocol

The YaCy protocol consists of text servlets that the built-in web server provides under /yacy/servletname.html. Other peers transmit data via GET parameters and receive a simple text as an answer; the exact format is different for the servlets.

Bootstrapping

When bootstrapping, YaCy tries to find the network with the other peers. To do this, a seed list is first searched for. In superseed.txt, the URL of a seed list that a YaCy peer uploads regularly is selected and then downloaded. The references of other peers are in the seeds.txt so that contact with the YaCy network can be established. At the next start, the known seeds can be bootstrapped, and the seed lists are only necessary if many references are no longer valid.

literature

Web links

Individual evidence

  1. The yacy Open Source Project on Open Hub: Languages Page . In: Open Hub . (accessed October 18, 2018).
  2. sciencenet.kit.edu ( Memento of the original from January 5, 2011 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / sciencenet.kit.edu
  3. heise.de
  4. yacy-websearch.net