Faceoff algorithm

The Bully algorithm is a recursive , distributed algorithm that is used in a distributed system when a new coordinator process needs to be identified because the original one has crashed. The latter can be determined, for example, by a timeout.

procedure

A process p, which has noticed the failure of the coordinator, sends requests to all processes that have a higher ID than itself. These processes send back a confirmation if they have not crashed themselves. If process p receives replies from processes with a higher ID, it no longer sends any further messages. If process p does not receive an answer, it becomes the coordinator itself.

Each process (with a higher ID) that answered p, in turn, sends queries to all processes that have a higher ID than it to find out whether these still exist, which they then repeat as well (recursion). The last process no longer has a process to ask because it has the highest ID because the coordinator process with the highest ID has failed and can no longer answer. He himself takes the place of the new coordinator and broadcasts the message that he is the new coordinator.

Assumptions

In order for the Bully algorithm to work verifiably, the following assumptions must apply in the distributed system:

All processes cooperate and use exactly the same voting algorithm.
There are no errors in the implementation and all processes also offer the option of processing received messages at all times.
If a process P1 receives the message M from process P2, then it is ensured that the message was sent at an earlier point in time. So there are no spontaneously generated messages in the system.
All processes have so-called "storage cells" in which the data on which they are working is stored. This means that even in the event of an error or failure of the process, the data is saved. Data access in "storage cells" should be based on transactions - either the new data is written to the storage cell or it is discarded, but the old data are retained and continue to be consistent.
If a process fails, it immediately stops processing its data. If it is activated again, it will start editing again (where it left off).
There are no transmission errors in the system. This means that all messages are transmitted correctly.
All messages are processed in the order in which they arrive. If process 1 sends the messages M1 and M2 in this order, a second process will also process these messages in the order M1, M2.
There are no failures in communication and the system offers an upper time limit T, at which the message should definitely be processed. If a process has still not received a response from another after T time units, it can assume that the receiving process has crashed.
A process never stops responding to messages, and does so without delay.
The system's network works synchronously.

Pseudocode

P notices that the coordinator is no longer answering:

function hold_election()
   for each (Process Q) {
     if (Id(Q) > Id(P))
       send(Q, ELECTION);
     end if
   }

  if  $\exists$  Q: (receive(Q, ANSWER)) //timeout T for  $\exists$ 
  then
    do_something_else();
  else
    for each (Process Q) {
      if (Id(Q) < Id(P))
        send(Q, COORDINATOR);
      endif
    }
  endif

P receives ELECTION , sent by process pred :

function continue_election()
  send(pred, ANSWER);
  hold_election();

advantages

It is easily possible to replace a coordinator who is missing, since each process can trigger the determination of a new coordinator process.

disadvantage

All processes must be known to every process and there must be an absolute order in the processes. Otherwise, no process can determine which messages to send.
If a process reacts very slowly, this creates a delay that slows down the entire process.

literature

Emmett Witchel: Distributed Coordination. 2005, PPT file, accessed August 3, 2013.
Hector Garcia-Molina: Elections in a Distributed Computing System. In: IEEE . Transactions on Computers. Vol. C-31, No. January 1, 1982, ISSN 0018-9340 , pp. 48-59, doi : 10.1109 / TC.1982.1675885 .

Web links

Lecture “Distributed Systems” at the Hamburg University of Applied Sciences

Individual evidence

↑ Slide 5, Lecture "Distributed Systems" winter semester 2013/13 (PDF; 1.6 MB) - Dept. Informatik, HAW Hamburg , accessed on July 8, 2013

[1] Slide 5, Lecture "Distributed Systems" winter semester 2013/13 (PDF; 1.6 MB) - Dept. Informatik, HAW Hamburg , accessed on July 8, 2013