Kademlia

Kademlia is a technology for overlay networks that implements a distributed hash table , that is, stores information in a distributed network . Kademlia only defines the type and structure of the network. It was developed by Petar Maymounkov and David Mazières. It is widely used by file sharing tools, but is not limited to this application area.

Use in file sharing

Although Kademlia is often used by file-sharing programs, no files can be transferred using this protocol. Rather, the information that is necessary to find the desired files is stored here in the distributed network. The files can then be transferred using another protocol such as eDonkey2000 or BitTorrent .

Third generation file-sharing programs, of which Kademlia is one of the protocols, store the information about the network in a distributed hash table , so each node stores a redundant part of this information. This structure replaces the central indexing server of first generation programs. At the same time, the effort to find the desired files drops to O (log n) .

functionality

Most of the time Kademlia is used over the Internet with the connectionless User Datagram Protocol (UDP / IP).

Each node is identified by a unique number (called "Node ID"). This number is not only used for his identification, but is also used by Kademlia for other purposes. Your own node calculates a random node ID if it has never been in the network before.

A node that wants to join the network must first go through a process called “ bootstrapping ”: In this phase, the algorithm receives the IP address of some nodes that are already known in the Kademlia network from a server or user in the network . It now receives the IP addresses of other nodes from this first node, so that there is no longer any dependency on individual nodes. To do this, he first looks for nodes that are similar to his own ID in order to position himself as conveniently as possible where such an ID is expected.

Since there is no central instance that takes on the indexing of the available information, this task is shared equally among all nodes. A node that has information first calculates the characteristic hash value (called "ID") of this information. The hash function used in a Kademlia network must always be the same. That nodes are examined in the network, the nodes whose ID is the smallest " distance comprise" to hash and transmits them his contact information.

If a node is looking for exactly this information, it carries out the same procedure and thereby reaches the nodes that have saved who has this information in the network. Often only the hash value of the data is available to the searcher. He can now establish a direct connection to the sources and receive the data. This ensures that the searcher will find the contact details of the source exactly where they left them. Since the network is usually in a constant state of flux, the contact details are distributed over several nodes and updated by the source after a certain period of time.

To find a node, the algorithm works itself closer and closer to it until it is found or an error comes back. The maximum number of nodes to be questioned during this search corresponds to the distance between this node and you. If the number of participants in the network doubles, you do not have to ask twice as many nodes, but only one more node per doubling. The required bandwidth does not scale linearly with the size of the network, but logarithmically to base two.

Another advantage lies primarily in the decentralized structure, which significantly increases the resistance to distributed denial-of-service attacks. Even if a number of nodes are attacked, it won't have too much of an impact on the network itself. Over time, the net will knit itself around these new "holes".

At Kademlia, long-known, reliable nodes are always preferred to new ones during routing and are never removed from the routing tables, which is why it is difficult for an attacker to manipulate the routing structure of the network.

"Distance" from nodes

The above-mentioned “distance” has nothing to do with geographical conditions, but describes the distance within the ID range. So it can happen that z. B. a node from Germany and one from Australia are, so to speak, "neighbors". The distance between two nodes and hash values is calculated by the mathematical XOR function and is ID ₁ XOR ID ₂ interpreted as an integer.

Clients

Clients that implement a Kademlia algorithm (the actual networks for user data such as files are usually not compatible with one another):

eMule (from version 0.42) and aMule (from version 2.1.0), name: Kad
MLDonkey , name: Kad & Overnet
BitTorrent (from version 4.1.0, beta phase)
µTorrent
K torrent
Vuze
CSpace (an instant messenger )
Deluge (implemented via libtorrent, compatible with µTorrent)
rTorrent (from version 0.8.0)
eDonkey2000 , Name: Overnet (development discontinued)

Web links

Kademlia in the computer science wiki of the Humboldt University Berlin
Petar Maymounkov and David Mazières: Kademlia - A Peer-to-peer Information System Based on the XOR Metric ( PDF ):
- @1@ 2Template: Dead Link / www.cs.rice.edu( Page no longer available , search in web archives: short version ) (PDF; 79 kB)
- Long version (PDF; 211 kB)