AlphaZero

from Wikipedia, the free encyclopedia

AlphaZero is an autodidactic computer program by DeepMind , whose algorithm learns several complex board games solely on the basis of the rules of the game and victory conditions as well as through intensive play against itself. The program uses a generalized approach from AlphaGo Zero and, after appropriate training, masters not only Go , but also the strategy games chess and Shogi .

On December 5, 2017 published Deep Mind , a research center for artificial intelligence and subsidiary of Alphabet Inc. , a preprint on arXiv about the program Alpha Zero , where it is described that Alpha Zero within 24 hours Reinforcement learning superior skill level reached and the most powerful Programs defeated Stockfish , Elmo and a three-day version of AlphaGo Zero in their respective disciplines, but used more powerful hardware than the opponent programs. With the document only ten winning games of AlphaZero against Stockfish were published. All other games as well as AlphaZero itself were initially inaccessible and the results of the document were not verified by a peer review . An expanded and peer-reviewed version of the article appeared in Science on December 7, 2018 .

AlphaZero beat the free chess program Stockfish 8 after nine hours of self-study. 64 TensorFlow Processing Units (TPU) of the second generation were used to train the artificial neural network . Another 5,000 TPUs of the first generation were used to generate the necessary training games. The algorithm with the trained neural network then played on a single computer with only four TPUs.

Relation to AlphaGo Zero

Alpha Zero (AZ) a generalized, uses generic version of the algorithm AlphaGo Zero (AGZ) and is capable of corresponding teaching the three Board Games Shogi , Chess and Go to play superhuman level. Differences between AZ and AGZ are:

  • AlphaZero has built-in algorithms for calculating hyperparameters.
  • The “artificial neural network” is continuously updated.
  • The rules of the Far Eastern board game Go are (in contrast to chess) invariant to the position of the playing field, i.e. they also apply after mirroring and rotating. Unlike AlphaGo Zero, the programming of AlphaZero does not take advantage of these symmetries.
  • Chess and Shogi can (just like Go) end in a draw, which is why AlphaZero has to consider this additional end of the game as a possibility. Instead of the win rate, AlphaZero tries to optimize the expected outcome of the game.

AlphaZero versus Stockfish and Elmo

In game theory , the board games chess, shogi and go are finite two-person zero-sum games with perfect information without the influence of chance. Two opponents take turns making a move on a square playing field. The strategy games differ in terms of the size of the playing field, the number of characters, the complexity of the game , their variance when the playing surface is rotated or mirrored and the possible end of the game.

game
Board size number of fields
State space complexity
(as decadic logarithm log 10 )
Game tree
complexity (log 10 )
Average playing time
in half moves
Complexity of
an appropriate generalization
chess 8 × 8 = 64 50 123 80 EXPTIME-complete
Shogi 9 × 9 = 81 71 226 110 EXPSPACE-complete
Go 19 × 19 = 361 171 360 250 EXPSPACE-complete

Classic chess programs such as Stockfish evaluate positions and pieces on the basis of characteristics that are mostly defined and weighted by human grandmasters, combined with a powerful alpha-beta search that generates and evaluates a huge search tree with a large number of heuristics and domain-specific adaptations. The AlphaZero algorithm only plays against itself on the basis of the rules of the game and starting with random moves, evaluates the results and optimizes its moves and strategies by adjusting the weights of its network. In view of the by Alpha Zero used Monte Carlo - search method , the program evaluated only 80,000 positions per second at chess and 40,000 in Shogi, whereas stockfish 70 million and Elmo calculated 35 million. AlphaZero compensates for the far lower number of evaluations with a neural network that focuses on the more promising variants within the search tree .

Results

chess

In December 2016 the chess program Stockfish 8 won the Top Chess Engine Championship (TCEC Season 9), an international, annual computer chess championship. In the chess games between AlphaZero and Stockfish 8 , both programs had one minute to think about each move . Out of 100 games with a classic starting line-up, AlphaZero won 25 games with white pieces, three times with black and scored 72 draws. In a series of twelve 100-game series against Stockfish , which began with the twelve most popular openings, AlphaZero won 290 times, lost 24 times and drew 886 times. Since the AlphaZero algorithm had more computing power than Stockfish in both cases , no clear conclusions can be drawn regarding the performance of the algorithms used.

Some chess grandmasters , such as Hikaru Nakamura and Komodo developer Larry Kaufman , criticize AlphaZero's victory in that the result would have been much tighter if the two programs had been allowed to use opening databases , as Stockfish has been optimized for this. In fact, in some games Stockfish made serious mistakes in the opening that would have been avoided with an opening book.

Tord Romstad, one of the lead developers at Stockfish , posted the following comment on Chess.com :

“The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute / move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions. "

“The game results themselves are not particularly meaningful, as Stockfish's choice of timing and parameter settings is quite strange: The games were played with a fixed move duration of one minute per move, which means that Stockfish could not use its time management heuristics (Much effort has been put into teaching Stockfish how to identify critical situations in the game and how much time it takes to make a move; with a fixed length of each move, the skill level suffers significantly). The version of Stockfish used is already a year old and played with far more search threads than has ever been significantly tested. The hash tables were way too small for the number of threads. I think the draw percentage would have been much higher in a game with normal conditions. "

- Tord Romstad

Shogi

Compared to chess, the Japanese Shogi is a more complex strategy game in terms of the number of possible moves, since it is played on a larger board and with more pieces and since most of the captured pieces can be used almost anywhere. After 12 hours of self-study, AlphaZero won 90 out of 100 games against Elmo , lost eight and drew two. Less than two hours of training were needed to reach Elmo's skill level. Within the community of Shogi programmers there was criticism of the playing conditions between the engines of AlphaZero and Elmo .

Go

After 34 hours of self-learning by Go, AlphaZero won against a three-day trained version of AlphaGo Zero in 60 cases and lost 40 times. AlphaZero reached the skill level of AlphaGo Lee after just eight hours . This is the version of the program that won the match between AlphaGo and Lee Sedol 4-1 in March 2016 .

Reactions

Several newspapers such as the Frankfurter Allgemeine Zeitung or The Times of London headlined the fact that chess training only took four hours: “ It was managed in little more than the time between breakfast and lunch. " Wired hailed AlphaZero as" the first multi-skilled AI board-game champ ". Joanna Bryson, an expert in artificial intelligence , noted that Google's “ knack for good publicityputs you in a strong position vis-à-vis your competitors:

“It's not only about hiring the best programmers. It's also very political, as it helps makes Google as strong as possible when negotiating with governments and regulators looking at the AI ​​sector. "

“It's not just about hiring the best programmers. It's also very political as it helps make Google as strong as possible in negotiating with governments and regulators dealing with the AI ​​sector. "

Danish grandmaster Peter Heine Nielsen said in an interview with the BBC :

“I always wondered how it would be if a superior species landed on earth and showed us how they played chess. Now I know. "

“I've always wondered what it would be like when a superior species landed on earth and showed us their way of playing chess. Now I know. "

The Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero as “ insane attacking chess ” (German: “ insane attacking chess ”) with profound positional play. Former world chess champion Garry Kasparov said:

“It's a remarkable achievement, even if we should have expected it after AlphaGo. We have always assumed that chess required too much empirical knowledge for a machine to play so well from scratch, with no human knowledge added at all. "

“It's a remarkable achievement, even if we could expect that after AlphaGo. We have always assumed that chess requires too much empirical knowledge for a machine to play it from scratch without any additional human knowledge. "

The English grandmaster Matthew Sadler analyzed all the available games from AlphaZero and, together with Natasha Regan, published the book Game Changer ( ISBN 978-90-5691-818-7 ) in spring 2019 , in which he described the play of the program as "groundbreaking" and its Skill level called "phenomenal".

The open source project Leela Chess Zero , Lc0 for short , tries to implement the design approaches and algorithms published by DeepMind for home PCs and mobile devices and is trained with the help of the community. It is based on the similarly motivated go engine Leela and already achieved great success in computer chess championships in 2018. In May 2019, Lc0 won the Top Chess Engine Championship (TCEC Season 16) for the first time .

Web links

Individual evidence

  1. a b Lars Fischer: Artificial intelligence beats the best chess computer in the world. In: spectrum . December 6, 2017, accessed December 13, 2017 .
  2. Stefan Löffler: AlphaZero instead of Alpha Beta. In: Frankfurter Allgemeine Zeitung . December 10, 2017, accessed December 13, 2017 .
  3. David Silver, Thomas Hubert, Julian Stepwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis : Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm . December 5, 2017 (English, arxiv.org [PDF; 621 kB ]).
  4. David Silver, Thomas Hubert1, Julian Stepwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran , Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis : A general reinforcement learning algorithm that masters chess, shogi , and Go through self-play . In: Science . Vol. 362, Issue 6419, pp. 1140-1144 doi : 10.1126 / science.aar6404
  5. a b Varun Kumar: Google's AlphaZero AI Masters Chess and Go Within 24 Hours. In: RankRed. Retrieved December 8, 2017, December 13, 2017 (UK English).
  6. James Vincent: DeepMind's AI became a superhuman chess player in a few hours, just for fun. In: The Verge . December 6, 2017. Retrieved December 9, 2017 (American English).
  7. a b The size of the state space and the game tree for chess were first estimated in Claude Shannon : Programming a Computer for Playing Chess . In: Philosophical Magazine . 41, No. 314, 1950. Shannon gave the estimates 10 43 and 10 120 , respectively , smaller values ​​than those in the table, which come from the work of Victor Allis.
  8. Aviezri Fraenkel , David Lichtenstein: Computing a perfect strategy for n × n chess requires time exponential in n . In: Journal of Combinatorial Theory, Series A . No. 31, 1981, pp. 199-214. doi : 10.1016 / 0097-3165 (81) 90016-9 .
  9. a b Shi-Jim Yen, Jr-Chang Chen, Tai-Ning Yang, Shun-Chin Hsu: Computer Chinese Chess . In: International Computer Games Association Journal . tape 27 , no. 1 , March 2004, p. 3–18 (English, psu.edu [PDF; 221 kB ]).
  10. Adachi Hiroyuki, Kamekawa Hiroyuki, Iwata Shigeki: Shogi on n × n board is complete in exponential time . In: Trans. IEICE . J70-D, 1987, pp. 1843-1852.
  11. John Tromp, Gunnar Farnebäck: Combinatorics of Go . In: tromp.github.io . 31 January 2016 (PDF, 483 kB; 38 pages; English) This work conducts the assessments 48 <log (log ( N )) <171 for the number of possible game histories N ago.
  12. ^ Victor Allis: Searching for Solutions in Games and Artificial Intelligence . 1994, ISBN 90-90-07488-0 (English, online [PDF; 10.3 MB ; accessed December 14, 2017] Ph.D. Thesis, University of Limburg, Maastricht, The Netherlands). Searching for Solutions in Games and Artificial Intelligence ( Memento of the original from May 6, 2016 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / digitalarchive.maastrichtuniversity.nl
  13. ^ JM Robson: Information Processing; Proceedings of IFIP Congress . 1983, The complexity of Go, p. 413-417 (English).
  14. Shivali Best, Joe Pinkstone: Google's 'superhuman' AlphaZero AI becomes one of the best chess players in the world after learning the game from scratch in just FOUR HOURS. In: Daily Mail . December 7, 2017. Retrieved December 13, 2017 (UK English).
  15. Google's “Alphazero” can also do chess. In: DLF24 . Deutschlandfunk , December 9, 2017, archived from the original on December 10, 2017 ; accessed December 10, 2017 .
  16. a b 'Superhuman' Google AI claims chess crown. In: BBC News . Retrieved December 7, 2017, December 11, 2017 (UK English).
  17. Mike Klein: Google's AlphaZero Destroys Stockfish In 100-Game Match. In: Chess.com. December 6, 2017, accessed December 10, 2017 .
  18. Stockfish (computer) vs AlphaZero (computer). In: chessgames.com. December 11, 2017, accessed December 11, 2017 .
  19. Peter Dogger: Alpha Zero: Reactions From Top GMs, Stockfish author. In: chess.com. December 8, 2017, accessed December 9, 2017 .
  20. Some concerns on the matching conditions between AlphaZero and Shogi engine . In: uuunuuun.com . December 7, 2017. (English)
  21. Aleaxander Armbruster: Smart computer plays world-class chess - after only four hours. In: Frankfurter Allgemeine Zeitung . December 8, 2017. Retrieved December 9, 2017 .
  22. ^ Nadeem Badshah: Google's DeepMind robot becomes world-beating chess grandmaster in four hours. In: The Times of London . December 7, 2017. Retrieved December 10, 2017 (UK English).
  23. Tom Simonite: Alphabet's Latest AI Show Pony Has More Than One Trick. In: Wired . December 6, 2017. Retrieved December 11, 2017 (American English).
  24. ^ Sarah Knapton, Leon Watson: Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours. In: The Telegraph. Retrieved December 6, 2017, December 10, 2017 (UK English).
  25. Samuel Gibbs: Alpha Zero AI beats champion chess program after teaching itself in four hours. In: The Guardian . Retrieved December 7, 2017, December 9, 2017 (UK English).
  26. Machine Learning: Make Alpha Zero yourself . In: crn.de
  27. Lc0 for the Web . In: frpays.github.io .
  28. Pete: CCC 3: Rapid Redux . In: Chess.com . 4th October 2018.
  29. 24th WCCC Stockholm 2018 . In: chessprogramming.org .
  30. A new age in computer chess? Lc0 beats Stockfish! Retrieved February 5, 2020 .