Minimax algorithm

The Minimax algorithm is an algorithm for determining the optimal game strategy for finite two-person zero-sum games with perfect information . These games include in particular board games such as chess , Go , Othello / Reversi , Checkers , Mills and Four Wins , in which both players always know the entire history of the game. The Minimax algorithm can also be expanded on the basis of expected values for games with a random influence such as backgammon . As a rule, but not exclusively, the Minimax algorithm is applied to games with alternating moves.

A strategy calculated with the minimax algorithm is called a minimax strategy . It ensures the player in question the highest possible profit that can be achieved regardless of how the opponent plays. The strategy pair formed from the minimax strategies of both players forms a Nash equilibrium .

In non-zero-sum games, in which the opponent's defeat does not necessarily coincide with your own profit, the Minimax algorithm does not necessarily provide an optimal strategy.

Variants of the Minimax algorithm form the core element of playing programs such as a chess program . The increasing computing power of computers has meanwhile meant that even with such complex games as chess, all people can now be beaten by the computer without any effort.

For some games, such as the so-called Nim game , an optimal strategy can also be calculated using more efficient algorithms from combinatorial game theory .

Evaluation function

An ideal evaluation function assigns a position the value +1 if player A wins, and the value −1 if player B wins and 0 if player B wins. If you can build the search tree from all game positions to the maximum depth (to the end position where you can see who wins), the algorithm plays a perfect game. In practice, however, the complete structure of a search tree is only possible for very simple games such as tic-tac-toe .

With almost all other games this is too computationally intensive. Therefore, one is content with building the search tree up to a given search depth (horizon). The evaluation function is modified, very good game positions for A get very high values, very good game positions for B get very low values. Heuristics are used for estimation to determine the values .

Search tree example

Minimax algorithm: The circles represent the moves of the players in the algorithm (maximization), the squares the moves of the opponents (minimization). The values in the circles and squares represent the value α of the minimax algorithm. The red arrows represent the selected move, the numbers on the left the tree depth and the blue arrows the selected move.

The picture on the right shows a simple tree with search depth 4. It is player A's turn.

The nodes of levels 0 and 2 correspond to game situations in which it is player A's turn. Here the evaluation function of the subordinate nodes is maximized, i.e. H. the move favorable to player A is selected and its value assigned to the parent node.

The nodes on levels 1 and 3 correspond to game situations in which it is player B's turn. Here the evaluation function of the subordinate nodes is minimized, i.e. H. the most favorable move for player B is selected and its value assigned to the parent node.

The algorithm starts at the bottom of the leaves and then goes up to the root. In level 3, the algorithm selects the smallest value of the child nodes and assigns this to the parent node (it is minimized). In level 2, the largest child node is then assigned to the parent node (it is maximized). This is done alternately until the root is reached. The value of the largest child node is assigned to the root. This is then the move that is to be played.

Remarks

The minimax method is essentially independent of the specific game, i.e. chess and reversi, for example, use the same algorithm.
Only the following two program parts are interfaces to the special game:
- Which moves are possible in a specific game situation?
- How is a game situation evaluated numerically?
In addition to the Minimax process, a game can use other game-specific processes, for example precalculated libraries for opening moves.

The minimax algorithm is linear with respect to the number of possible moves to be checked. As a rule, the longer the search depth increases, the longer it takes exponentially . (Note that in theory the running time is constant in a game with a finite number of states, since the computing time does not increase beyond a certain depth. However, since in most games this depth can never be realistically reached, it is entirely justified to speak of an exponential growth.) On the other hand, the quality of the search result usually rises (depending on the numerical evaluation) with a higher search depth.

There are therefore various optimized variants, for example

Variable search depth: If there are only a few possible moves per game situation, for example because there are only a few pieces left on the playing field, the search depth can be increased (and vice versa).
Dynamic search depth: If the numerical values at one point in the search tree change significantly from level to level, the search depth can be increased locally (and vice versa).
The alpha-beta pruning can be used.

A significant time saving results from storing the positions examined so far and their evaluations. If a position is reached by different sequences of moves from the starting position, the entire search tree below does not have to be searched every time. In practice, efficient hash tables are often used for this storage .

Iterative depth first search

Especially with limited time to search (eg. In tournament chess) is iterative deepening search ( iterative deepening ) is used. Starting from the position to be examined, the search is started repeatedly and the desired search depth is gradually increased. If the positions that have already been examined are saved as described above, only the positions newly reached compared to the previous search need be evaluated with the evaluation function. This procedure is continued until the available search time is exceeded or a “sufficiently good” result has been achieved.

Without iterative depth-first search, if the time limit was exceeded, in the worst case only a single train would have been examined, but this would have been examined to a very great depth. The next move, which might have secured the profit after just a single return move, would not have been tried at all.

implementation

Main program (extract):

 gespeicherterZug = NULL;
 int gewuenschteTiefe = 4;
 int bewertung = max(+1, gewuenschteTiefe);
 if (gespeicherterZug == NULL)
    es gab keine weiteren Zuege mehr;
 else
    gespeicherterZug ausführen;

The normal variant is:

 int max(int spieler, int tiefe) {
    if (tiefe == 0 or keineZuegeMehr(spieler))
       return bewerten();
    int maxWert = -unendlich;
    generiereMoeglicheZuege(spieler);
    while (noch Zug da) {
       fuehreNaechstenZugAus();
       int wert = min(-spieler, tiefe-1);
       macheZugRueckgaengig();
       if (wert > maxWert) {
          maxWert = wert;
          if (tiefe == gewuenschteTiefe)
             gespeicherterZug = Zug;
       }
    }
    return maxWert;
 }

 int min(int spieler, int tiefe) {
    if (tiefe == 0 or keineZuegeMehr(spieler))
       return bewerten();
    int minWert = unendlich;
    generiereMoeglicheZuege(spieler);
    while (noch Zug da) {
       fuehreNaechstenZugAus();
       int wert = max(-spieler, tiefe-1);
       macheZugRueckgaengig();
       if (wert < minWert) {
          minWert = wert;
       }
    }
    return minWert;
 }

The NegaMax variant is:

 int miniMax(int spieler, int tiefe) {
    if (tiefe == 0 or keineZuegeMehr(spieler))
       return bewerten(spieler);
    int maxWert = -unendlich;
    generiereMoeglicheZuege(spieler);
    while (noch Zug da) {
       fuehreNaechstenZugAus();
       int wert = -miniMax(-spieler, tiefe-1);
       macheZugRueckgaengig();
       if (wert > maxWert) {
          maxWert = wert;
          if (tiefe == gewuenschteTiefe)
             gespeicherterZug = Zug;
       }
    }
    return maxWert;
 }

Note : While the standard implementation maximizes for one player and minimizes it for the other player, the Negamax variant maximizes for both players. It follows that the evaluation function must behave differently in both implementations.

Standard implementation : The better the position on the board is for the maximizing player, the greater the return value of the evaluation function (evaluate function ()). The better it is for the minimizing player, the smaller the return value.
Negamax implementation : Since both players maximize, the evaluation function must deliver higher values the better the board position of the person currently drawing (evaluate function (player)). The value is always given from its point of view.

Variant: The Negamax algorithm

In order to simplify the code and to be able to treat every node of the search tree in the same way, it is defined that every player tries to achieve a maximum result for himself, i.e. the maximum value of the evaluation function. To do this, the evaluation of the subsequent positions must be multiplied by (negation, hence the name). It is no longer necessary to distinguish whether it is A or B's turn and therefore the maximum or the minimum is to be calculated, but only the maximum of the negated evaluations of the subsequent positions is calculated in each position. ${\ displaystyle -1}$

Web links

Java applet for demonstration