Alpha-Beta Search

from Wikipedia, the free encyclopedia
Alpha-Beta Search

The alpha-beta search (also called alpha-beta cut or alpha-beta pruning ) is an optimized variant of the minimax search process , i.e. an algorithm for determining an optimal move in games with two opposing parties. During the search, two values ​​- Alpha and Beta - are updated to indicate what result players can achieve with optimal play. With the help of these values ​​it can be decided which parts of the search tree do not have to be examined because they can not influence the result of the problem solution .

The simple (non-optimized) alpha-beta search delivers exactly the same result as the minimax search.

Informal description

The Minimax algorithm analyzes the complete search tree. In doing so, nodes are also considered that are not included in the result (the choice of the branch at the root). The alpha-beta search ignores all nodes of which, at the moment that the search reaches them, it is already certain that they cannot influence the result.

A clear example of how it works is a two-person game in which the first player selects one of several bags and receives the item with the lowest value from this bag from his opponent.

The Minimax algorithm searches all pockets completely for the selection and therefore takes a lot of time. The alpha-beta search, on the other hand, initially only searches the first bag completely for the item with minimal value. All other pockets are only searched until the value of an object reaches or falls below this minimum. It is then certain that this bag is no better than the first for the first player, and the search in it can be stopped. Otherwise, this bag will be a better choice and its minimum value will serve as a new frontier to further search.

Similar situations are familiar to every chess player who is currently checking a specific move to see whether it seems advantageous to him. If, during his analysis of the move, he finds a response from the opponent that is unfavorable for himself, then he will regard this move as "refuted" and reject it. It would be pointless to investigate further replies from the opponent to determine whether the opponent has more effective rebuttals and how bad the move under consideration actually is.

The algorithm

The alpha-beta search works in the same way as the informal description above. The idea is that two values ​​(alpha and beta) are passed that describe the worst case scenario for the players. The alpha value is the minimum result that player A will achieve, the beta value is the maximum result that player B will achieve. (It should be noted here that for player B it is a matter of getting the lowest possible result, since he is playing "minimizing"!)

If a maximizing node (from player A) has a move whose return exceeds the beta value, the search in this node is aborted ( beta cutoff , because player B would not even offer this option to A because it would be his previous maximum Concession would exceed). If instead the train delivers a result that exceeds the current alpha value, this is raised accordingly.

The same applies to the minimizing nodes, with values lower than alpha being canceled (alpha cutoff) and the beta value being adjusted downwards.

An alphabet search tree with 3 cutoffs

The figure above shows an example tree with 18 leaves, of which only 12 are evaluated. The three bordered values ​​of an inner node describe the alpha value, the return value and the beta value.

The search algorithm uses a so-called alpha-beta window, the lower limit of which is the alpha value and the upper limit of which is the beta value. This window is passed on to the child nodes, starting in the root with the maximum window [-inf, inf]. Leaves 1, 2 and 3 are evaluated by a maximizing node and the best value 10 is passed to the minimizing parent node . This adjusts the beta value and transfers the new window [-inf, 10] to the next maximizing child node , which has leaves 4, 5 and 6. The return value 12 from sheet 5 is so good that it exceeds the beta value 10. Thus sheet 6 no longer has to be considered because the result 12 of this subtree is better than that of the left subtree and would therefore never be selected by the minimizing player.

The same applies to the minimizing node with the 3-alpha cutoff. Although this subtree was only partially evaluated, it is clear that the maximizing root node would never choose this variant because the minimizing node could force a result of at most 3, while the result 12 is ensured from the middle subtree.

implementation

Note: Differences to the simple Minimax algorithm are highlighted in yellow.

Main program (extract):

 gespeicherterZug = NULL;
 int gewuenschteTiefe = 4;
 int bewertung = max(gewuenschteTiefe,
                     -unendlich, +unendlich);
 if (gespeicherterZug == NULL)
    es gab keine weiteren Zuege mehr (Matt oder Patt);
 else
    gespeicherterZug ausführen;

The normal variant is:

 int max(int tiefe,
         int alpha, int beta) {
    if (tiefe == 0 or keineZuegeMehr(spieler_max))
       return bewerten();
    int maxWert = alpha;
    Zugliste = generiereMoeglicheZuege(spieler_max);
    for each (Zug in Zugliste) {
       fuehreZugAus(Zug);
       int wert = min(tiefe-1,
                      maxWert, beta);
       macheZugRueckgaengig(Zug);
       if (wert > maxWert) {
          maxWert = wert;
          if (tiefe == gewuenschteTiefe)
             gespeicherterZug = Zug;
          if (maxWert >= beta)
             break;
       }
    }
    return maxWert;
 }
 int min(int tiefe,
         int alpha, int beta) {
    if (tiefe == 0 or keineZuegeMehr(spieler_min))
       return bewerten();
    int minWert = beta;
    Zugliste = generiereMoeglicheZuege(spieler_min);
    for each (Zug in Zugliste) {
       fuehreZugAus(Zug);
       int wert = max(tiefe-1,
                      alpha, minWert);
       macheZugRueckgaengig(Zug);
       if (wert < minWert) {
          minWert = wert;
          if (minWert <= alpha)
             break;
       }
    }
    return minWert;
 }

The NegaMax variant is:

 int miniMax(int spieler, int tiefe,
             int alpha, int beta) {
    if (tiefe == 0 or keineZuegeMehr(spieler))
       return bewerten(spieler);
    int maxWert = alpha;
    Zugliste = generiereMoeglicheZuege(spieler);
    for each (Zug in Zugliste) {
       fuehreZugAus(Zug);
       int wert = -miniMax(-spieler, tiefe-1,
                           -beta, -maxWert);
       macheZugRueckgaengig(Zug);
       if (wert > maxWert) {
          maxWert = wert;
          if (tiefe == gewuenschteTiefe)
             gespeicherterZug = Zug;
          if (maxWert >= beta)
             break;
       }
    }
    return maxWert;
 }

Note: While the standard implementation maximizes for one player and minimizes it for the other player, the Negamax variant maximizes for both players and swaps and negates alpha and beta during recursion. It follows that the evaluation function must behave differently in both implementations.

  • Standard implementation: The better the position on the board is for the maximizing player, the greater the return value of the evaluation function (evaluate function ()). The better it is for the minimizing player, the smaller the return value.
  • Negamax implementation: Since both players maximize, the evaluation function must deliver higher values ​​the better the board position of the person currently drawing (evaluate function (player)). The value is always given from its point of view.

Optimizations

If the game tree can only be calculated to a certain depth due to the complexity of the game under consideration, basically two optimization approaches are possible. On the one hand, the evaluation function can be improved, and on the other hand, the alpha-beta algorithm itself offers optimization potential. The better the evaluation function, the less deeply the game tree has to be viewed in order to achieve a certain level of play.

In the following only optimizations of the alpha-beta search are discussed.

Pre-sorting of trains

In contrast to the Minimax algorithm, the order in which child nodes (trains) are processed plays an important role in the alpha-beta search. The faster the alpha-beta window is reduced, the more cutoffs are achieved. This is why it is important to first look at the moves that give the best result. Various heuristics are used in practice. In chess z. B. you can sort the moves according to whether or which piece is captured, or which piece captures. “Tower beats lady” is sorted before “tower beats tower” and “farmer beats tower” is sorted between the two.

Principal Variation Search

A node in the alpha-beta search belongs to one of three categories (based on the NegaMax variant):

  • Alpha node: Every subsequent move returns a value less than or equal to alpha, which means that no good move is possible here.
  • Beta node: At least one subsequent move delivers a value greater than or equal to beta, which means a cutoff.
  • Principal variation node: At least one subsequent train delivers a value greater than alpha, but all deliver a value less than beta.

Sometimes you can tell early on which node it is. If the first tested subsequent train delivers a value greater than or equal to beta, then it is a beta node. If it delivers a value less than or equal to alpha, then it is possibly an alpha node (provided the trains are well pre-sorted). However, if it delivers a value between alpha and beta, it may be a principal variation node.

The principal variation search now assumes that a subsequent move that delivers a value between alpha and beta will turn out to be the best possible move. Therefore, the alpha-beta window below the minimum is (alpha, alpha + 1) reduced (zero window) to reach a maximum number of cutoffs, but to prove the remaining trains to be worse.

If, when searching with this zero window, it turns out that the train has a value greater than alpha (but the result is still smaller than beta , otherwise you can make a beta cut straight away ), the search must be repeated with a larger window: There if you already know a lower limit for the tension value from the zero window search, the window should extend from this value to beta during the repeat search . Because of the sometimes necessary repetition search, success with this method is only achieved if the trains have been well pre-sorted, because it minimizes incorrect assignments in one of the three categories mentioned. That is, it becomes unlikely that a subsequent move will have a better value than alpha and that the search will have to be repeated. On the other hand, principal variation nodes can also be used in conjunction with iterative deepening for pre-sorting the trains.

int AlphaBeta(int tiefe, int alpha, int beta)
{
    if (tiefe == 0)
        return Bewerten();
    BOOL PVgefunden = FALSE;
    best = -unendlich;
    Zugliste = GeneriereMoeglicheZuege();
    for each (Zug in Zugliste)
    {
        FuehreZugAus(Zug);
        if (PVgefunden)
        {
            wert = -AlphaBeta(tiefe-1, -alpha-1, -alpha);
            if (wert > alpha && wert < beta)
                wert = -AlphaBeta(tiefe-1, -beta, -wert);
        } else
            wert = -AlphaBeta(tiefe-1, -beta, -alpha);
        MacheZugRueckgaengig(Zug);
        if (wert > best)
        {
            if (wert >= beta)
                return wert;
            best = wert;
            if (wert > alpha)
            {
                alpha = wert;
                PVgefunden = TRUE;
            }
        }
    }
    return best;
}

Iterative deepening search (iterative deepening)

The iterative depth-first search is the step-by-step increase in the depth of the search tree. Since the alpha-beta search is a depth -first search , it is usually not possible to determine in advance how long the calculation will take. Therefore you start with a shallow search depth and increase it gradually. The result of a calculation can be used to better pre-sort the trains if the search depth is increased.

Aspiration windows

Aspiration windows are used together with the iterative depth first search . Basically, the alpha-beta search starts at the root with a maximum window. With the iterative depth first search, it can be assumed that a new calculation with a higher depth will deliver a similar result value. Therefore, the search window can initially be set to a (relatively) small area around the result value of the previous calculation. If it turns out that this window was too small, the search must be repeated with a larger or maximum window (similar to the principal variation search).

Killer heuristic

The killer heuristic is a special type of train presorting. It is assumed here that moves that have caused a cutoff will also cause a cutoff in other parts of the search tree (at the same depth). Therefore, in future they will always be considered first, provided that they represent valid moves in the playing position just considered. This heuristic cannot be used sensibly in all games, since the prospect that these so-called killer moves are still valid moves in other parts of the search tree depends on the type of game.

Quiet search

With the alpha-beta search or the Minimax algorithm, it is not very beneficial if the search is terminated when a fixed search depth is reached. In this way, in chess, for example, capturing a covered pawn by the queen could appear to be advantageous if the recoil is below the search horizon and is therefore "overlooked" by the algorithm. For this reason, the transition from the alpha-beta search to the evaluation function takes place dynamically, specifically in such a way that an approximately constant is awaited below the minimum search depth with regard to the evaluation function. This "peace" is in terms of the values of the evaluation function eponymous for the procedure is called a rest search ( English Quiescence search ). In chess, the search for peace leads to the fact that an exchange of blows is analyzed to the end.

Zero move search

The zero move search is an optimization that takes advantage of the fact that in some games (e.g. chess) the move right is an advantage.

Comparison of Minimax and AlphaBeta

The following table shows an example calculation of a chess position with a constant search depth of four half-moves (each player moves twice). The normal Minimax algorithm was used and alpha-beta without train sorting and with (simple) train sorting. The percentage information at the cutoffs relate to the entire search tree and describes how much of the entire search tree was not evaluated. These are estimates based on the fact that the subtrees are roughly the same size (with cutoffs, it is not known how big the subtree actually would be).

algorithm reviews Cutoffs Proportion of cutoffs Computing time in seconds
Minimax 28,018,531 0 0.00% 134.87 s
AlphaBeta 2,005,246 136,478 91.50% 9.88 s
AlphaBeta + train sorting 128,307 27,025 99.28% 0.99 s

It can be clearly seen that the alpha-beta search means a considerable increase in speed compared to the Minimax. The train sorting also improves the computation time in this example by a factor of 10. The fact that the number of cutoffs decreases absolutely with train sorting can be explained by the fact that these occur at higher levels in the search tree and therefore larger parts are cut away.

history

While the central importance of the Minimax algorithm for chess programming was already emphasized by their pioneers Alan Turing and Claude Shannon in the 1940s, the alpha-beta search began in the late 1950s, with only alpha cutoffs at first were used. It was not until the 1960s that the alpha-beta algorithm became an integral part of chess programs. A first, albeit unpublished, description is from 1963. The first detailed study appeared in 1975.

literature

Web links

Footnotes

  1. Jörg Bewersdorff: Luck, Logic and Bluff. Mathematics in the game - methods, results and limits. 5th edition. 2010, p. 184.
  2. ^ Allen Newell , JC Shaw & HA Simon : Chess-playing programs and the problem of complexity. In: IBM Journal of Research and Development. Volume 2, Issue 4, 1958, doi: 10.1147 / rd.24.0320 , pp. 330-335 ( PDF; 2.172 MB ).
  3. DJ Edwards & TP Hart: The alpha-beta heuristic (= AI Memos. 030). Massachusetts Institute of Technology, 1963 ( PDF; 288 kB ).
  4. ^ Donald E. Knuth & Ronald W. Moore: An analysis of alpha-beta pruning. In: Artificial Intelligence. Volume 6, Issue 4, 1975, doi: 10.1016 / 0004-3702 (75) 90019-3 , pp. 293-326
This version was added to the list of articles worth reading on March 6, 2006 .