Based on the Bradley-Terry model - named after RA Bradley and ME Terry, who presented it in 1952 - which in turn is based on a work by Ernst Zermelo from the 1920s, Arpad Elo developed an objective rating system for the US Chess Federation in 1960 USCF . It was adopted by the world chess federation FIDE at the 1970 congress in Siegen . The World Chess Federation calls its system "FIDE rating system". A rating number is officially called "FIDE rating", but is usually simply referred to as an "Elo rating". In addition to the international rating system of FIDE, there are also national rating systems with different names. In Germany the national rating system is called DWZ , in Austria (national) Elo numbers are calculated and in Switzerland there is a leadership list with leadership numbers . These systems evaluate significantly more local tournaments, but also calculate the ratings according to the methods of Arpad Elo with mostly only minor modifications and different factors.
Each player is assigned an Elo number R (from English rating ). The stronger the player, the higher the number. If several players compete against each other, the expected number of points of the respective players can be determined from the Elo numbers of the players. After the match, the players' Elo rating is adjusted to their results. Depending on the discrepancy between the expected value and the result, a player gains or loses Elo rating points. The system is designed so that Elo rating points are redistributed among the players involved.
If a player does not yet have an Elo rating, for example as a newcomer, his Elo rating is estimated.
In an encounter between two players there is one point for a win, half a point for a draw and no point for a loss. The expected number of points is therefore the probability that the player will win plus half the probability of a draw . This expected value is calculated from the rating as follows:
|R A −R B||E A||E B|
- E A : Expected score for player A.
- R A : previous Elo rating of player A.
- R B : previous Elo rating of player B.
applies , where E B is the expected value for player B calculated in the same way. The number 400 contained in the formula was chosen by Arpad Elo in such a way that the Elo numbers are as compatible as possible with the ratings of the rating system previously used by Kenneth Harkness . In fact, the Harkness model can be viewed as a piecewise linear approximation to the Elo model. If the rating difference is more than 400 points, the value 400 or −400 is used instead of the actual difference.
A player's expectation of winning as a function of the difference in points follows a logistical function in Elos' model . To avoid a misunderstanding: That does not mean, however , that the game strengths are modeled as logistically distributed random variables , because this is not the case - the property of the multiplicativity of the expected values (see below) characteristic of Elos' model cannot be derived from the assumption of a logistic Derive the distribution (or a normal distribution ) of the skill levels. (One can of course construct a distribution so that exactly this property is fulfilled, but there is no plausible explanation why the strengths should follow this random mechanism. It is therefore more sensible to use multiplicativity as the starting point for the modeling and to dispense with a distribution assumption. )
Multiplicativity of the expected values
The expected values are multiplicative. If, for example, player A is a 3: 1 favorite with player B (ie A raises 75% of the points in games against B) and B is a 2: 1 favorite with C, then Elo's model demands or follows from Elo's model that A is against C is a 6-1 favorite.
In general: If A is an x: 1 favorite compared to B and B a y: 1 favorite compared to C, then according to Elos model A is an xy: 1 favorite compared to C.
This can be easily calculated. However, multiplicativity is not a consequence of a normal distribution - although one often reads that the Elo model is based on a normal distribution, this assumption is only a very rough approximation of the requirement for multiplicativity, so that the requirement for multiplicativity is the better starting point for development of the model - especially for the calculation of the skill levels of players of earlier eras.
Adjustment of the Elo rating
The new rating of player A results from the previous and the current performance, the latter being weighted with a factor . The larger , the greater the impact of newly achieved results.
- : is usually 20, for top players (Elo> 2400) 10, for fewer than 30 rated games 40, for youth players (under 18, Elo <2300) 40
- : points actually achieved by the player (1 for every win, 0.5 for every draw, 0 for every defeat)
The chess player Alfred (Elo: 2806) plays against the chess player Berta (Elo: 2577). According to the first formula, one expects that Alfred (player A) against Berta (player B) gets on average = 0.789 points per game:
After a game there are three options:
a) Berta wins - so = 0. The new Elo points for Alfred and Berta are
Alfred loses eight Elo points, while Berta gains eight Elo points.
b) Alfred wins - so = 1.
Alfred receives two more Elo points, Berta loses two.
c) Tie - so = 0.5.
Alfred loses three Elo points, Berta wins three.
As Elo performance (including tournament performance called) is called the figure expressed in Elo points a player's performance in a single tournament. In contrast to the normal Elo calculation, the previous Elo number is not included in this rating. In addition to its purely sporting informative value, the Elo performance is selected as a criterion for awarding special prizes if another direct comparison of the players' performance is not possible - e.g. B. to determine the best individual player in a team tournament.
Before the Elo rating was introduced, chess players were classified into nine classes or categories. A class difference meant that the better player could expect 0.75 points as a result of a game . In the Elo system, this difference in skill level corresponds to a difference of around 200 rating points.
|Elo rating||Category men||Category women|
|2300-2399||FIDE master||Grand Master of Women (WGM)|
|2200-2299||Candidate Master or National Master||International Women's Champion (WIM)|
|2100-2199||Master candidate||FIDE Women's Champion (WFM)|
|2000-2099||expert||Candidate Master of Women (WCM)|
|1800-1999||Amateur, class A, very good club player|
|1600-1799||Amateur, class B, strong recreational player|
|1400-1599||Amateur, class C, above average player|
|1200-1399||Amateur, grade D, average hobby player|
It should be noted that the various titles of Grand Master (GM) and International Master (IM) are not only obtained on the basis of a certain Elo number, but also through the fulfillment of other specified standards. In order to receive the title after meeting all standards, a prospective GM must have achieved an Elo rating of at least 2500, an IM a number of at least 2400. The requirements for titles for women are each 200 Elo points lower than for corresponding titles for men.
The scope of a class is 200 Elo points. The system is calibrated so that a difference of 200 points corresponds to a profit expectation of the stronger player of 76%, 400 points correspond to 92% profit expectation, the formula P = 1 / (1 + 10 −D / 400 ), where D is the difference of the The player with a lower rating and that of the player with a higher rating is only an approximation. The comparison is based on statistical methods. With a difference of 600 points, the stronger player almost always wins statistically (98%). With computers, the distribution is not only the same according to the 200-point definition, but also very similar in terms of cornering behavior, but with similarly powerful machines there is a further spread of skill levels in the various game phases.
Round tournaments are also divided into categories based on the average rating of the participants . A difference of one category corresponds to 25 Elo points. A tournament in category 1 is classified as a tournament whose participants have an average of 2251 to 2275 Elo points. The currently strongest tournaments reach category 22, which corresponds to an average of 2776 to 2800 Elo points. At the Zurich Chess Challenge 2014, category 23 (with an average rating of 2801) was achieved for the first time in January 2014.
The Elo system divides the chess players into nine classes with the help of a rating number, with the lower limit of the upper class being 2600 and the upper limit of the lower class being 1200. The ratings of an individual player are interval-scaled and approximately normally distributed and fluctuate around a mean value with a standard deviation of 200. There are many players with skill levels below 1200, but the Elo system is only valid to a limited extent in terms of predictability at this level. It is particularly important at the amateur level that a player can defend his number against stronger opponents without having to concentrate on special characteristics such as unconscious psychological weaknesses or poor time management of newcomers. Utopian high values are corrected quickly, precisely and reliably through defeat. The fairly stable Elo number is determined using various methods. Some assume that there are few games or similarly strong tournament participants ; after many games, they all reach very similar equilibria.
The basis of the calculation is the hypothesis that the distribution of the playing strength in the totality of the players corresponds mathematically to the normal distribution (Gaussian bell curve). Based on this hypothesis, it can be statistically predicted for two opponents with what probability one player will win. In the special case of an identical rating, the probabilities are equally high. In a tournament, a player's rating and the average rating of their opponents can predict what score they are likely to get. At the end of the tournament, the actual result is compared with the statistically predicted result and the player's new rating is calculated from the deviation.
Problems with rating systems
Intransitivity of probability relations
If player A is the favorite against player B and B against C, then A has a higher rating than B and B a higher than C. This means that A has a higher rating than C and should be favorite against C.
However, this conclusion is by no means mandatory, since probability or preference relations are not necessarily transitive . This problem is not a peculiarity of the Elo system, but a fundamental problem of all rating systems. (cf. Condorcet paradox , "Chinese dice" or " Intransitive dice ")
However, transitivity is a necessary prerequisite for a meaningful rating system. In order to ensure this quality, Arpad Elo assumed when developing its rating system that the skill levels could be described using the formula:. In addition to transitivity, this assumption also results in the multiplicativity of the expected values shown above.
Deflation and inflation
If you want to compare the strengths of players from different eras with the help of the Elo numbers - or other ratings, this does not only apply to the Elo system, a rating of z. B. 1600 from 1970 equates to a rating of 1600 from 2000. In particular, since the average skill level at least does not deteriorate over time as a result of the further development of the theory, the average rating number should not decrease.
In the Elo system, the winner of a game gains just as many rating points as the loser loses: the average playing strength of both remains the same. If the rating pool only includes top players, the following phenomenon can be observed: Whenever a player is added to the ratings, he enters with a certain (low) number of points. In the course of his career he improves his strength, gains points, and later retires with a (high) number of points - as a result, points are withdrawn from the community and the average rating number drops; ie the system is deflationary.
If the rating pool is enlarged, the opposite effect occurs: many players leave the rating pool with a lower rating than was assigned to them when they entered - the system is now inflationary.
This was particularly the case earlier, when the World Chess Federation FIDE only included chess players in the ranking from a rating of 2200. Since the Elo evaluation of tournaments is chargeable and therefore represents a source of income for FIDE, this threshold has been lowered further and further, most recently to 1200 in July 2009. Nevertheless, it cannot be avoided that many players enter the rating pool with lower ratings leave as they received on entry. A moderate inflation is, however, absolutely desirable, this should take into account the further development of the playing strengths in the course of time, but here the problem of too high inflation usually arises.
So the Elo numbers could always reach new records without actually being a measure of the skill level. About 20 years ago there were only two players with an Elo rating above 2700, and only about 10 to 20 players achieved a rating above 2600. In July 2010, over 200 active players had an Elo rating above 2600, 37 of them at least 2700; three players even had an Elo rating of 2800 or higher, which 20 years ago seemed unthinkable.
The average rating of the first 100 players in the world rankings rose between July 2000 and July 2012 from 2644 to 2703 points, an increase of 59 rating points. Since 2012 the mean value has been between 2700 and 2706 and is therefore fairly constant.
The thousand-game problem
Another phenomenon is the so-called thousand-game problem . Often players of the same skill level meet again and again. Suppose two players with an Elo 2000 rating play ten games, one of which gets 80% of the points. After calculating the new Elo rating, the values are 2080 for the winner and 1920 for the loser. However, if the two players play 1000 games with the same point ratio without the rating being updated, the winner will be given a new rating that is higher than that of the current world champion. However, this scenario is only theoretical. According to the statistic law of large numbers , one can expect that the two equally strong players (both had an Elo 2000 rating) will approach the expected 50% after many games. In addition, in practice there will never be 1000 games without a rating update.
The development of the value figures is also influenced by the evaluation period. After a test phase with irregular publications, a new list was published once a year in January from 1975 to 1980. Beginning in July 1981, it was switched to semi-annual publication and this was maintained until July 2000. In October 2000, the system switched to publication every three months. Evaluation was carried out every two months from July 2009 to July 2012. Since August 2012 it has been evaluated monthly. Since then, the minimum rating has been 1000 points, previously it was 1200. In principle, an evaluation after each tournament would make sense, as this way fluctuations in the form of players can be better balanced. However, this is not currently planned.
Skills of selected chess players
After the Elo number was introduced as a rating system in 1970, Bobby Fischer's record of 2785 points from July 1972 initially stood for many years. In 1999, the then classical world chess champion Garri Kasparow achieved the rating of 2851 points, which was only exceeded in January 2013 by Magnus Carlsen with 2861 points. Carlsen has since increased the record to 2882 (list from May 2014).
Elo numbers can also be calculated for individual tournaments. Fabiano Caruana achieved an Elo rating of 3103 at the Sinquefield Cup tournament in St. Louis in 2014. The previous highest tournament Elo rating was 3002, achieved by Magnus Carlsen in Nanjing in 2010.
Grandmasters usually have an Elo rating of at least 2500, from 2600 points one can speak of the extended world elite. The status of the FIDE evaluation from February 2019 is shown in the following table with the twenty highest rated active players, supplemented by the best woman and the best male and female players from Germany, Austria and Switzerland (in brackets: place in the women's ranking) :
Historic rating in chess
To compare today's top players with grandmasters before the introduction of the Elo number, the so-called historical Elo number is used.
The Elo numbers of chess computers or computer programs cannot be compared without further ado with those of human chess players, since they were mainly determined through games between computers and not through participation in official tournaments.
In Go , the skill level is traditionally given in Kyū grades ( Japanese 級 ) for students and Dan grades ( Japanese 段 ) for masters. The determination of this skill level is based within the European Go Federation and on many Go servers on the Internet on a system derived from Elo, which shows Kyū and Dan grades as follows:
|kyu / dan||Elo||Skill level and experience|
|30k||Understand the rules, but haven't played a game yet|
|29k - 28k||played a few games|
|27k - 25k||won some games against beginners|
|24k - 22k||won some games against non-beginners|
|21k - 18k||0-349||Hobby player|
|17k - 14k||350-749||regular hobby player|
|13k - 10k||750-1149||Club player|
|9k - 5k||1150-1649||regular club player|
|4k - 1k||1650-2049||good club player|
|1d - 3d||2050-2349||very good club player|
|4d - 7d||from 2350||one of the best players in his country|
|1p - 9p||from around 2600||professional go player (from Japan, Korea or China) who plays stronger than an amateur 6dan|
|world's best 9p player||3627||Ke Jie, world's best go-player (as of January 4, 2017)|
|world's best 9p AI||5185||AlphaGo Zero on a TPU -v2 module with 180 TFLOPS|
The FIFA world rankings for women have been officially determined using an adapted Elo system since 2003. Since 2018, the FIFA world ranking for men has also been converted to an adapted Elo system.
A long-standing unofficial adaptation of the Elo system for men's national teams in football are the World Football Elo ratings . Unofficial Elo ratings are also made for soccer clubs.
Since the 2010/2011 season, Swiss Table Tennis has been using a slightly modified Elo formula to calculate rating points
- E A : Expected score for player A.
- R A : Player A's previous number of points
- R B : Player B's previous number of points
The expected value for A is now E A · 100%. Player A's new score is
- S A : actually played score (1 for every win, 0 for every defeat, a draw is not possible in table tennis)
For global Scrabble (Global Scrabble) an Elo ranking list is kept by the World English-language Scrabble Players' Association (WESPA). The New Zealander Nigel Richards (2258 Elo points, as of August 29, 2016) is in first place in this Elo ranking .
An Elo ranking list has also been maintained for German-speaking Scrabble since 2009 - based on tournaments from 2005 onwards. Among 206 players from 5 countries, the German Ben Berger is in first place with 1754 Elo points (as of February 26, 2017).
League of Legends
In the MOBA League of Legends , a league of a computer strategy game, the Elo system was also used in rated games. In the meantime it has been replaced by the league system, which is still based on the Elo system.
If you win, you get League Points (LP), if you lose , LP is deducted. At 100 LP you have to win a best of three or five to move up a league.
- FIDE Rating Regulations (Handbook on ratings.fide.com)
- Chessgraphs.com - Compare chess players' rating histories with FIDE data back to 1970 (English)
- Top 100 active chess players
- Elo live list of chess players over 2700 (women: 2500) points
- Elo survey with historical numbers up to 1970
- Elo ratings in football - national teams
- Elo ratings in football - European clubs
- EGF ranking of European Go players
- Analysis of rating inflation by Jeff Sonas on chessbase.com (English)
- Elo calculator web application (Garri Kasparow vs. Zsuzsa Polgár as an example calculation) (German)
- Tribute to Arpad Elo with a photo and an explanation of his formula. (English)
- EEM van Berkum: Bradley-Terry model , Encyclopedia of Mathematics Online, accessed November 18, 2014 .
- Ralph Allan Bradley, Milton E. Terry: Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika, Vol. 39, No. 3/4, p. 324, 1952 JSTOR (accessed on August 22, 2018) .
- David R. Hunter: MM algorithms for generalized Bradley-Terry models. The Annals of Statistics, Vol. 32, No. 1, 2004, pp. 384-406 Online JSTOR (accessed August 22, 2018) .
- Ernst Zermelo: The calculation of the tournament results as a maximum problem of the probability calculation. Mathematische Zeitschrift , Vol. 29, No. 1, 1929, pp. 436–460 DOI (accessed on August 22, 2018) .
- Heinz-Dieter Ebbinghaus: Ernst Zermelo: An Approach to His Life and Work . Springer, Berlin 2007, ISBN 978-3-5404-9553-6 , pp. 268-269.
- Changes in the Rating Regulations from July 1 , 2009 On: fide.com, July 15, 2009
- 8.0. The working of the FIDE Rating System , World Chess Federation
- 12.0. Some comments on the Rating system , World Chess Federation
- FIDE July ratings - Carlsen at a record 2837 , chessbase.com (English)
- Johannes Fischer: Sinquefield Cup: Three draws at the end. In: Chess News. chessbase, September 7, 2014, accessed September 8, 2014 .
- Stefan Löffler: Like a machine. The only 22 year old Fabiano Caruana teaches world chess champion Carlsen to fear . In: Frankfurter Allgemeine Zeitung of September 8, 2014, p. 32.
- Rémi Coulom: Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength. In: remi-coulom.fr. Retrieved January 4, 2017 .
- Ke Jie. In: goratings.org. Retrieved January 4, 2017 .
- David Silver, Julian Stepwieser, Karen Simonyan: Mastering the game of Go without human knowledge . In: Nature . tape 550 , October 19, 2017, p. 354-359 (English, abstract ).
-  (accessed July 20, 2017).
- WESPA Ratings ( English ) September 16, 2016. Accessed September 16, 2016.
- Elo ranking ( Memento from April 24, 2012 in the Internet Archive )