Elo rating

from Wikipedia, the free encyclopedia
chess
Go

The Elo number is a rating that describes the skill level of chess and goos players . The concept has since been adapted for various other sports.

Based on the Bradley-Terry model - named after RA Bradley and ME Terry, who presented it in 1952 - which in turn is based on a work by Ernst Zermelo from the 1920s, Arpad Elo developed an objective rating system for the US Chess Federation in 1960 USCF . It was adopted by the world chess federation FIDE at the 1970 congress in Siegen . The World Chess Federation calls its system "FIDE rating system". A rating number is officially called "FIDE rating", but is usually simply referred to as an "Elo rating". In addition to the international rating system of FIDE, there are also national rating systems with different names. In Germany the national rating system is called DWZ , in Austria (national) Elo numbers are calculated and in Switzerland there is a leadership list with leadership numbers . These systems evaluate significantly more local tournaments, but also calculate the ratings according to the methods of Arpad Elo with mostly only minor modifications and different factors.

calculation

Basic principle

Each player is assigned an Elo number R (from English rating ). The stronger the player, the higher the number. If several players compete against each other, the expected number of points of the respective players can be determined from the Elo numbers of the players. After the match, the players' Elo rating is adjusted to their results. Depending on the discrepancy between the expected value and the result, a player gains or loses Elo rating points. The system is designed so that Elo rating points are redistributed among the players involved.

If a player does not yet have an Elo rating, for example as a newcomer, his Elo rating is estimated.

Expected value

In an encounter between two players there is one point for a win, half a point for a draw and no point for a loss. The expected number of points is therefore the probability that the player will win plus half the probability of a draw . This expected value is calculated from the rating as follows:

R A −R B E A E B
0 0.50 0.50
100 0.64 0.36
200 0.76 0.24
300 0.85 0.15
400 0.91 0.09
E A : Expected score for player A.
R A : previous Elo rating of player A.
R B : previous Elo rating of player B.

Because of

applies , where E B is the expected value for player B calculated in the same way. The number 400 contained in the formula was chosen by Arpad Elo in such a way that the Elo numbers are as compatible as possible with the ratings of the rating system previously used by Kenneth Harkness . In fact, the Harkness model can be viewed as a piecewise linear approximation to the Elo model. If the rating difference is more than 400 points, the value 400 or −400 is used instead of the actual difference.

A player's expectation of winning as a function of the difference in points follows a logistical function in Elos' model . To avoid a misunderstanding: That does not mean, however , that the game strengths are modeled as logistically distributed random variables , because this is not the case - the property of the multiplicativity of the expected values (see below) characteristic of Elos' model cannot be derived from the assumption of a logistic Derive the distribution (or a normal distribution ) of the skill levels. (One can of course construct a distribution so that exactly this property is fulfilled, but there is no plausible explanation why the strengths should follow this random mechanism. It is therefore more sensible to use multiplicativity as the starting point for the modeling and to dispense with a distribution assumption. )

Multiplicativity of the expected values

The expected values ​​are multiplicative. If, for example, player A is a 3: 1 favorite with player B (ie A raises 75% of the points in games against B) and B is a 2: 1 favorite with C, then Elo's model demands or follows from Elo's model that A is against C is a 6-1 favorite.

In general: If A is an x: 1 favorite compared to B and B a y: 1 favorite compared to C, then according to Elos model A is an xy: 1 favorite compared to C.

This can be easily calculated. However, multiplicativity is not a consequence of a normal distribution - although one often reads that the Elo model is based on a normal distribution, this assumption is only a very rough approximation of the requirement for multiplicativity, so that the requirement for multiplicativity is the better starting point for development of the model - especially for the calculation of the skill levels of players of earlier eras.

Adjustment of the Elo rating

The new rating of player A results from the previous and the current performance, the latter being weighted with a factor . The larger , the greater the impact of newly achieved results.

: is usually 20, for top players (Elo> 2400) 10, for fewer than 30 rated games 40, for youth players (under 18, Elo <2300) 40
: points actually achieved by the player (1 for every win, 0.5 for every draw, 0 for every defeat)

An example

The chess player Alfred (Elo: 2806) plays against the chess player Berta (Elo: 2577). According to the first formula, one expects that Alfred (player A) against Berta (player B) gets on average = 0.789 points per game:

After a game there are three options:

a) Berta wins - so = 0. The new Elo points for Alfred and Berta are

Alfred loses eight Elo points, while Berta gains eight Elo points.

b) Alfred wins - so = 1.

Alfred receives two more Elo points, Berta loses two.

c) Tie - so = 0.5.

Alfred loses three Elo points, Berta wins three.

Elo performance

As Elo performance (including tournament performance called) is called the figure expressed in Elo points a player's performance in a single tournament. In contrast to the normal Elo calculation, the previous Elo number is not included in this rating. In addition to its purely sporting informative value, the Elo performance is selected as a criterion for awarding special prizes if another direct comparison of the players' performance is not possible - e.g. B. to determine the best individual player in a team tournament.

chess

Before the Elo rating was introduced, chess players were classified into nine classes or categories. A class difference meant that the better player could expect 0.75 points as a result of a game . In the Elo system, this difference in skill level corresponds to a difference of around 200 rating points.

Assignment of the titles according to the rating
Elo rating Category men Category women
≥ 2500 Grandmaster
2400-2499 International master
2300-2399 FIDE master Grand Master of Women (WGM)
2200-2299 Candidate Master or National Master International Women's Champion (WIM)
2100-2199 Master candidate FIDE Women's Champion (WFM)
2000-2099 expert Candidate Master of Women (WCM)
1800-1999 Amateur, class A, very good club player
1600-1799 Amateur, class B, strong recreational player
1400-1599 Amateur, class C, above average player
1200-1399 Amateur, grade D, average hobby player
1000-1199 Casual gamer
<1000 Beginner

It should be noted that the various titles of Grand Master (GM) and International Master (IM) are not only obtained on the basis of a certain Elo number, but also through the fulfillment of other specified standards. In order to receive the title after meeting all standards, a prospective GM must have achieved an Elo rating of at least 2500, an IM a number of at least 2400. The requirements for titles for women are each 200 Elo points lower than for corresponding titles for men.

The scope of a class is 200 Elo points. The system is calibrated so that a difference of 200 points corresponds to a profit expectation of the stronger player of 76%, 400 points correspond to 92% profit expectation, the formula P = 1 / (1 + 10 −D / 400 ), where D is the difference of the The player with a lower rating and that of the player with a higher rating is only an approximation. The comparison is based on statistical methods. With a difference of 600 points, the stronger player almost always wins statistically (98%). With computers, the distribution is not only the same according to the 200-point definition, but also very similar in terms of cornering behavior, but with similarly powerful machines there is a further spread of skill levels in the various game phases.

Tournament category

Round tournaments are also divided into categories based on the average rating of the participants . A difference of one category corresponds to 25 Elo points. A tournament in category 1 is classified as a tournament whose participants have an average of 2251 to 2275 Elo points. The currently strongest tournaments reach category 22, which corresponds to an average of 2776 to 2800 Elo points. At the Zurich Chess Challenge 2014, category 23 (with an average rating of 2801) was achieved for the first time in January 2014.

statistics

The Elo system divides the chess players into nine classes with the help of a rating number, with the lower limit of the upper class being 2600 and the upper limit of the lower class being 1200. The ratings of an individual player are interval-scaled and approximately normally distributed and fluctuate around a mean value with a standard deviation of 200. There are many players with skill levels below 1200, but the Elo system is only valid to a limited extent in terms of predictability at this level. It is particularly important at the amateur level that a player can defend his number against stronger opponents without having to concentrate on special characteristics such as unconscious psychological weaknesses or poor time management of newcomers. Utopian high values ​​are corrected quickly, precisely and reliably through defeat. The fairly stable Elo number is determined using various methods. Some assume that there are few games or similarly strong tournament participants ; after many games, they all reach very similar equilibria.

The basis of the calculation is the hypothesis that the distribution of the playing strength in the totality of the players corresponds mathematically to the normal distribution (Gaussian bell curve). Based on this hypothesis, it can be statistically predicted for two opponents with what probability one player will win. In the special case of an identical rating, the probabilities are equally high. In a tournament, a player's rating and the average rating of their opponents can predict what score they are likely to get. At the end of the tournament, the actual result is compared with the statistically predicted result and the player's new rating is calculated from the deviation.

Problems with rating systems

Intransitivity of probability relations

If player A is the favorite against player B and B against C, then A has a higher rating than B and B a higher than C. This means that A has a higher rating than C and should be favorite against C.

However, this conclusion is by no means mandatory, since probability or preference relations are not necessarily transitive . This problem is not a peculiarity of the Elo system, but a fundamental problem of all rating systems. (cf. Condorcet paradox , "Chinese dice" or " Intransitive dice ")

However, transitivity is a necessary prerequisite for a meaningful rating system. In order to ensure this quality, Arpad Elo assumed when developing its rating system that the skill levels could be described using the formula:. In addition to transitivity, this assumption also results in the multiplicativity of the expected values ​​shown above.

Deflation and inflation

If you want to compare the strengths of players from different eras with the help of the Elo numbers - or other ratings, this does not only apply to the Elo system, a rating of z. B. 1600 from 1970 equates to a rating of 1600 from 2000. In particular, since the average skill level at least does not deteriorate over time as a result of the further development of the theory, the average rating number should not decrease.

In the Elo system, the winner of a game gains just as many rating points as the loser loses: the average playing strength of both remains the same. If the rating pool only includes top players, the following phenomenon can be observed: Whenever a player is added to the ratings, he enters with a certain (low) number of points. In the course of his career he improves his strength, gains points, and later retires with a (high) number of points - as a result, points are withdrawn from the community and the average rating number drops; ie the system is deflationary.

If the rating pool is enlarged, the opposite effect occurs: many players leave the rating pool with a lower rating than was assigned to them when they entered - the system is now inflationary.

This was particularly the case earlier, when the World Chess Federation FIDE only included chess players in the ranking from a rating of 2200. Since the Elo evaluation of tournaments is chargeable and therefore represents a source of income for FIDE, this threshold has been lowered further and further, most recently to 1200 in July 2009. Nevertheless, it cannot be avoided that many players enter the rating pool with lower ratings leave as they received on entry. A moderate inflation is, however, absolutely desirable, this should take into account the further development of the playing strengths in the course of time, but here the problem of too high inflation usually arises.

So the Elo numbers could always reach new records without actually being a measure of the skill level. About 20 years ago there were only two players with an Elo rating above 2700, and only about 10 to 20 players achieved a rating above 2600. In July 2010, over 200 active players had an Elo rating above 2600, 37 of them at least 2700; three players even had an Elo rating of 2800 or higher, which 20 years ago seemed unthinkable.

The average rating of the first 100 players in the world rankings rose between July 2000 and July 2012 from 2644 to 2703 points, an increase of 59 rating points. Since 2012 the mean value has been between 2700 and 2706 and is therefore fairly constant.

The thousand-game problem

Another phenomenon is the so-called thousand-game problem . Often players of the same skill level meet again and again. Suppose two players with an Elo 2000 rating play ten games, one of which gets 80% of the points. After calculating the new Elo rating, the values ​​are 2080 for the winner and 1920 for the loser. However, if the two players play 1000 games with the same point ratio without the rating being updated, the winner will be given a new rating that is higher than that of the current world champion. However, this scenario is only theoretical. According to the statistic law of large numbers , one can expect that the two equally strong players (both had an Elo 2000 rating) will approach the expected 50% after many games. In addition, in practice there will never be 1000 games without a rating update.

The development of the value figures is also influenced by the evaluation period. After a test phase with irregular publications, a new list was published once a year in January from 1975 to 1980. Beginning in July 1981, it was switched to semi-annual publication and this was maintained until July 2000. In October 2000, the system switched to publication every three months. Evaluation was carried out every two months from July 2009 to July 2012. Since August 2012 it has been evaluated monthly. Since then, the minimum rating has been 1000 points, previously it was 1200. In principle, an evaluation after each tournament would make sense, as this way fluctuations in the form of players can be better balanced. However, this is not currently planned.

Skills of selected chess players

After the Elo number was introduced as a rating system in 1970, Bobby Fischer's record of 2785 points from July 1972 initially stood for many years. In 1999, the then classical world chess champion Garri Kasparow achieved the rating of 2851 points, which was only exceeded in January 2013 by Magnus Carlsen with 2861 points. Carlsen has since increased the record to 2882 (list from May 2014).

Elo numbers can also be calculated for individual tournaments. Fabiano Caruana achieved an Elo rating of 3103 at the Sinquefield Cup tournament in St. Louis in 2014. The previous highest tournament Elo rating was 3002, achieved by Magnus Carlsen in Nanjing in 2010.

Grandmasters usually have an Elo rating of at least 2500, from 2600 points one can speak of the extended world elite. The status of the FIDE evaluation from February 2019 is shown in the following table with the twenty highest rated active players, supplemented by the best woman and the best male and female players from Germany, Austria and Switzerland (in brackets: place in the women's ranking) :

rank Surname Rating
(Feb. 2019)
country
1 Magnus Carlsen 2845 NorwayNorway Norway
2 Fabiano Caruana 2828 United StatesUnited States United States
3 Ding Liren 2812 China People's RepublicPeople's Republic of China People's Republic of China
4th Anish Giri 2797 NetherlandsNetherlands Netherlands
5 Şəhriyar Məmmədyarov 2790 AzerbaijanAzerbaijan Azerbaijan
6th Maxime Vachier-Lagrave 2780 FranceFrance France
7th Viswanathan Anand 2779 IndiaIndia India
8th Alexander Grishchuk 2771 RussiaRussia Russia
9 Jan Nepomnyashchi 2771 RussiaRussia Russia
10 Levon Aronjan 2767 ArmeniaArmenia Armenia
11 Wesley So 2765 United StatesUnited States United States
12 Yu Yangyi 2764 China People's RepublicPeople's Republic of China People's Republic of China
13 Teymur Rəcəbov 2756 AzerbaijanAzerbaijan Azerbaijan
14th Sergei Karjakin 2753 RussiaRussia Russia
15th Vladimir Kramnik 2753 RussiaRussia Russia
16 Hikaru Nakamura 2749 United StatesUnited States United States
17th Wesselin Topalow 2740 BulgariaBulgaria Bulgaria
18th David Navara 2738 Czech RepublicCzech Republic Czech Republic
19th Pyotr Swidler 2737 RussiaRussia Russia
20th Richárd Rapport 2735 HungaryHungary Hungary
...
59 Markus Ragger 2683 AustriaAustria Austria
...
69 Liviu-Dieter Nisipeanu 2670 GermanyGermany Germany
...
86 (1) Hou Yifan 2662 China People's RepublicPeople's Republic of China People's Republic of China
...
237 Vadim Milov 2603 SwitzerlandSwitzerland Switzerland
...
1129 (20) Elisabeth Pähtz 2466 GermanyGermany Germany
...
4554 (171) Regina Theissl-Pokorná 2311 AustriaAustria Austria
...
6852 (279) Monika Müller-Seps 2255 SwitzerlandSwitzerland Switzerland

Historic rating in chess

To compare today's top players with grandmasters before the introduction of the Elo number, the so-called historical Elo number is used.

Computer chess

The Elo numbers of chess computers or computer programs cannot be compared without further ado with those of human chess players, since they were mainly determined through games between computers and not through participation in official tournaments.

Go

In Go , the skill level is traditionally given in Kyū grades ( Japanese ) for students and Dan grades ( Japanese ) for masters. The determination of this skill level is based within the European Go Federation and on many Go servers on the Internet on a system derived from Elo, which shows Kyū and Dan grades as follows:

kyu / dan Elo Skill level and experience
30k   Understand the rules, but haven't played a game yet
29k - 28k   played a few games
27k - 25k   won some games against beginners
24k - 22k   won some games against non-beginners
21k - 18k 0-349 Hobby player
17k - 14k 350-749 regular hobby player
13k - 10k 750-1149 Club player
9k - 5k 1150-1649 regular club player
4k - 1k 1650-2049 good club player
1d - 3d 2050-2349 very good club player
4d - 7d from 2350 one of the best players in his country
1p - 9p from around 2600 professional go player (from Japan, Korea or China) who plays stronger than an amateur 6dan
world's best 9p player 3627 Ke Jie, world's best go-player (as of January 4, 2017)
world's best 9p AI 5185 AlphaGo Zero on a TPU -v2 module with 180 TFLOPS

Soccer

The FIFA world rankings for women have been officially determined using an adapted Elo system since 2003. Since 2018, the FIFA world ranking for men has also been converted to an adapted Elo system.

A long-standing unofficial adaptation of the Elo system for men's national teams in football are the World Football Elo ratings . Unofficial Elo ratings are also made for soccer clubs.

Table tennis

Since the 2010/2011 season, Swiss Table Tennis has been using a slightly modified Elo formula to calculate rating points

E A : Expected score for player A.
R A : Player A's previous number of points
R B : Player B's previous number of points

The expected value for A is now E A · 100%. Player A's new score is

S A : actually played score (1 for every win, 0 for every defeat, a draw is not possible in table tennis)

Scrabble

For global Scrabble (Global Scrabble) an Elo ranking list is kept by the World English-language Scrabble Players' Association (WESPA). The New Zealander Nigel Richards (2258 Elo points, as of August 29, 2016) is in first place in this Elo ranking .

An Elo ranking list has also been maintained for German-speaking Scrabble since 2009 - based on tournaments from 2005 onwards. Among 206 players from 5 countries, the German Ben Berger is in first place with 1754 Elo points (as of February 26, 2017).

League of Legends

In the MOBA League of Legends , a league of a computer strategy game, the Elo system was also used in rated games. In the meantime it has been replaced by the league system, which is still based on the Elo system.

If you win, you get League Points (LP), if you lose , LP is deducted. At 100 LP you have to win a best of three or five to move up a league.

Web links

Individual evidence

  1. ^ EEM van Berkum: Bradley-Terry model , Encyclopedia of Mathematics Online, accessed November 18, 2014 .
  2. ^ Ralph Allan Bradley, Milton E. Terry: Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika, Vol. 39, No. 3/4, p. 324, 1952 JSTOR (accessed on August 22, 2018) .
  3. ^ David R. Hunter: MM algorithms for generalized Bradley-Terry models. The Annals of Statistics, Vol. 32, No. 1, 2004, pp. 384-406 Online JSTOR (accessed August 22, 2018) .
  4. Ernst Zermelo: The calculation of the tournament results as a maximum problem of the probability calculation. Mathematische Zeitschrift , Vol. 29, No. 1, 1929, pp. 436–460 DOI (accessed on August 22, 2018) .
  5. ^ Heinz-Dieter Ebbinghaus: Ernst Zermelo: An Approach to His Life and Work . Springer, Berlin 2007, ISBN 978-3-5404-9553-6 , pp. 268-269.
  6. a b c Changes in the Rating Regulations from July 1 , 2009 On: fide.com, July 15, 2009
  7. 8.0. The working of the FIDE Rating System , World Chess Federation
  8. 12.0. Some comments on the Rating system , World Chess Federation
  9. http://ratings.fide.com/toplist.phtml?list=men
  10. FIDE July ratings - Carlsen at a record 2837 , chessbase.com (English)
  11. Johannes Fischer: Sinquefield Cup: Three draws at the end. In: Chess News. chessbase, September 7, 2014, accessed September 8, 2014 .
  12. Stefan Löffler: Like a machine. The only 22 year old Fabiano Caruana teaches world chess champion Carlsen to fear . In: Frankfurter Allgemeine Zeitung of September 8, 2014, p. 32.
  13. ^ Rémi Coulom: Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength. In: remi-coulom.fr. Retrieved January 4, 2017 .
  14. Ke Jie. In: goratings.org. Retrieved January 4, 2017 .
  15. ^ David Silver, Julian Stepwieser, Karen Simonyan: Mastering the game of Go without human knowledge . In: Nature . tape 550 , October 19, 2017, p. 354-359 (English, abstract ).
  16. https://resources.fifa.com/image/upload/revision-of-the-fifa-coca-cola-world-ranking.pdf?cloudid=akxuma7jhfjwlwfmfexz
  17. http://clubelo.com/
  18. [1] (accessed July 20, 2017).
  19. WESPA Ratings ( English ) September 16, 2016. Accessed September 16, 2016.
  20. Elo ranking ( Memento from April 24, 2012 in the Internet Archive )
  21. http://forums.euw.leagueoflegends.com/board/showthread.php?t=1232907&page=1#post11986204