1 / 27

How Trustworthy is Crafty’s Analysis of Chess Champions?

CGW 2007. How Trustworthy is Crafty’s Analysis of Chess Champions?. Matej Guid, Aritz P é rez, and Ivan Bratko. University of Ljubljana Slovenia. Introduction. H igh-quality chess programs provided an opportunity of a more objective comparison between C hess Champions

lenci
Télécharger la présentation

How Trustworthy is Crafty’s Analysis of Chess Champions?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGW 2007 How Trustworthy is Crafty’s Analysis of Chess Champions? Matej Guid, Aritz Pérez,and Ivan Bratko University of Ljubljana Slovenia

  2. Introduction High-quality chess programs provided an opportunity of a moreobjective comparison between Chess Champions Guid and Bratko carried out a computer analysis of games played by World Chess Champions (ICGA Journal, Vol 29, No. 2, June 2006). an attempt at an objective assessment of chess playing strength of chess players of different times Crafty’s Analysis of Chess Champions (2006) the idea was to determine the chess players' quality of play(regardless of the game score) computeranalyses of individual moves made by each player an open source chess program Crafty was used for the analysis I Wilhelm Steinitz, 1886 - 1894

  3. II Emanuel Lasker, 1894 -1921

  4. Readers’ feedback Afrequent comment by the readers could be summarised as: “A very interesting study, but it has a flaw in that programCrafty, whose rating is only about 2620, was usedto analyse theperformance of players stronger than this. For this reason theresultscannot be useful”. Some readers speculated further that the program will givebetterranking to players that have a similar strength to theprogram itself. Reservations by the readers • Thethree main objectionsto the used methodology: • the program used for analysis was too weak • the search depth performed by the program was too shallow • the number of analysed positions was too low (at least for some players) III Jose Raul Capablanca, 1921 -1927

  5. II Emanuel Lasker, 1894 -1921

  6. Readers’ feedback Afrequent comment by the readers could be summarised as: “A very interesting study, but it has a flaw in that programCrafty, whose rating is only about 2620, was usedto analyse theperformance of players stronger than this. For this reason theresultscannot be useful”. Some readers speculated further that the program will givebetterranking to players that have a similar strength to theprogram itself. Reservations by the readers • Thethree main objectionsto the used methodology: • the program used for analysis was too weak • the search depth performed by the program was too shallow • the number of analysed positions was too low (at least for some players) III Jose Raul Capablanca, 1921 -1927

  7. Different search depths-> different rankings? Botvinnik-Tal, World Chess Championship match (game 17, position afterwhite’s 23rd move), Moscow 1961. In the diagram position, Tal played23…Nc7 and later won the game. The table on the right shows Crafty’sevaluations as results of differentdepths of search.As it is usual for chess programs, the evaluations vary considerably with depth. IV Alexander Alekhine, 1927 -1935 and 1937 - 1946

  8. Ranking of the players at different depths In particular, we were interested inobserving to what extent is the rankingof the players preserved at different depths ofsearch. Searching deeper strength of computer chess programs increases with search depth K. Thompson (1982): searching to only one ply deeper results in more than200 rating points stronger performance of the program The gains in the strength diminish with additional search, but are neverthelesssignificant at search depths up to 20 plies (J.R. Steenhuisen, 2005) Preservation of the rankings at different search depths... ... would therefore suggest that: the same rankings would have been obtained by searching deeper using a stronger chess program would NOT affect the results significantly V Max Euwe, 1935 - 1937

  9. Method • The same methodolody as in the original study (Guid and Bratko, 2006): • games for the title of “World Chess Champion”were selected for analysis • each position after move 12 was iteratively searched to depths 2 to 12 ply • moves where both the move made and the move suggested by the computer had an evaluation outside the interval [-2, 2], were discarded • The only difference in methodology between this and the original study: • the search was not extended to 13 plies in the endgame • The aim of our analysis: • to obtain rankings of the players at the different depths of search V Max Euwe, 1935 - 1937

  10. The results of the original study • In this study we concentrate on the basic criterion of the original study: • average difference between moves played and best evaluated moves VI Mikhail Botvinnik, 1948 - 1957, 1958 - 1960, and 1961 - 1963

  11. Players’ scores at different search depths The players whose scores clearly deviate from the rest are: Capablanca, Kramnik, Euwe, and Steinitz. Average scores (deviations between the evaluations of played moves and best-evaluated moves according to the computer) of each player at different depths of search. VII Vasily Smyslov, 1957 - 1958

  12. Players’ ranks at different search depths The players whose scores clearly deviate from the rest are: Capablanca, Kramnik, Euwe, and Steinitz. Ranks of the players at different depths of search. VII Vasily Smyslov, 1957 - 1958

  13. Stability of the rankings In order to check the reliability of the program as a tool for ranking chess players, it was our goal to determine the stability of the obtained rankings: in different subsets of analysed positions, with increasing search depth. Subsets of analysed positions For each player: 100 subsets from the original dataset were generated each subset contained 500 randomly chosen positions (without replacement) VIII Mikhail Tal, 1960 - 1961

  14. The size of the subsets The number of available positions varies for different players, since they were involved in a different number of matches: the less positions were available for Fischer, about 600 both for Botvinnik and Karpov more than 5000 positions were available Experiments with subsets of different sizes For each player, average scores and standard deviations of the scoreswere computed for 1000 subsets ofdifferent sizes. The size of 500 already seems to be sufficient for reliable results. These experiments also show that the number of positions available from World Chess Championship matches could be sufficient for obtaining reliable results. VIII Mikhail Tal, 1960 - 1961

  15. 100 subsets: average scores The players whose average scores (across all the 100 subsets) clearly deviate from the rest are: Capablanca, Kramnik, Euwe, and Steinitz. The players whose scores clearly deviate from the rest are: Capablanca, Kramnik, Euwe, and Steinitz. ADDITIONAL RESULTS: average scores of the players across all depths for each subset separately: Capablanca had the best such score in 96% of all the subsets! IX Tigran Petrosian, 1963 - 1969

  16. Average standard deviations of the scores The average standard deviations of the players’ scores(obtained in 100 subsets) show that they areonly slightlyless variable at higher depths. They could be considered practically constant at depths higher than 7. All the standard deviations are quite low, considering the average differencebetween players whose score differ significantly. The scale is adjusted for easier comparison with the graph of the scores. X Boris Spassky, 1969 - 1972

  17. 100 subsets: average ranks The players whose average scores (across all the 100 subsets) clearly deviate from the rest are: Capablanca, Kramnik, Euwe, and Steinitz. The rankings preserve for Capablanca, Kramnik, Euwe and Steinitz,whose scores differ significantly from the rest of the players. XI Robert James Fischer, 1972 - 1975

  18. Average standard deviations of the ranks The average standard deviations of the players’ ranks (obtained in 100 subsets) are practicallyequal for most of the depths, which tells about stability of the obtained results. XII Anatoly Karpov, 1975 - 1985

  19. Standard deviations of average ranks across all depths Standard deviations of the average ranks (obtained in 100 subsets)were calculated fromdifferent depths foreach player separately. The rankings of most of the players on average preserve wellacrossdifferent depths of search. XII Anatoly Karpov, 1975 - 1985

  20. Different strengths of the program,yet so similar the obtained rankings!? Crafty’s chess strength at very shallow depths is much weaker than the one of an ordinary club player, yet the obtained rankings are so similar at all the depths... What are conditions under which the program gives reliable rankings? Is a stronger program necessarily a better one for ranking chess players? The obtained results already speak for themselves... ... however, we also provide: A simpleprobabilisticmodelof ranking by imperfect referee A more sophisticated mathematical explanation XIII Garry Kasparov, 1985 - 2000

  21. A Simple Probabilistic Model of Ranking by Imperfect Referee... ... shows the following: To obtain a sensible ranking of players, it is not necessary to use a computer that is stronger than the players themselves. The (fallible) computer will NOT exhibit preference for players of similar strength to the computer. (see the paper for details) XIII Garry Kasparov, 1985 - 2000

  22. A More Sophisticated Mathematical Explanation Suppose we have an estimator A that measures the goodness of anindividual M in a concrete task, by assigning this individual a score S,based on some examples. The estimator assigns different scores to the respective individuals and therefore has a variance associated: The estimator gives an approximation () of the real score () of the individual, which results in a bias: XIII Garry Kasparov, 1985 - 2000

  23. A More Sophisticated Mathematical Explanation The probability of an error in comparison of two individuals, M and N,using the estimator A, only depends on the bias and the variance (and not on the accuracy of the estimator). Given two different estimators, A and B IF • their scores are equally biased towards each individual: and • variances of the scores of both estimators are equal for each respective individual: and THEN • both estimators have the same probability of committingan error XIII Garry Kasparov, 1985 - 2000

  24. An additional experiment:Crafty, biases, and variances • In our study the subscript and the superscript of refer to: • M - a player • A - a depth of search The real score could not be determined. For each player,the average score at depth 12, obtained from allavailablepositions of each respective player, served as the best possibleapproximationof . The biases and the variances of each player were observed at each search depth up to 11. Once again the 100 subsets were used. XIV Vladimir Kramnik, 2000 -

  25. Crafty, biases, and variances The standard deviation of the bias over all players is very low at eachsearchdepth => is approximately equal for all the players M. The probability of an error of comparisonsperformed byCraftyis practically the sameat different levels of search, and onlyslightlydiminishes with increasing search depth. The program did not show any particular bias at any depth towardsCapablanca nor towards any other player. Standard deviations of the scores are also very low, and approximately equal at all depths => also holds. XIV Vladimir Kramnik, 2000 -

  26. Conclusion In thisstudy we analysed how trustworthy are the rankings of Chess Champions, produced by computer analysis using the program Crafty. • Our study shows that it is not verylikelythat the ranking of at leastthe players whose scores differ enough from the others would change if: • a stronger chess program was used, • the program would search deeper, and • larger sets of positions were available for the analysis. • The rankings are surprisingly stable: • over a large interval of search depths, and • over a large variation of sample positions. • It is particularly surprising that even extremely shallow search of just two or three ply enable reasonable rankings. XIV Vladimir Kramnik, 2000 -

  27. ?

More Related