Distribution of Elo.
I have been thinking about Elo ratings recently, after analyzing my tactics ratings. I have a lot of questions about Elo: is it really predictive of performance? why don't we calibrate Elo to a quantitative strategy? can we really compare players across different eras? why not use an extended Kalman Filter instead of Elo? etc. One question I had which I consider here is, "what is the standard error of Elo?"
Consider two players. Let the difference in true abilities between them be denoted \(\Delta a\), and let the difference in their Elo ratings be \(\Delta r\). The difference in abilities is such that the odds that the first player wins a match between them is \(10^{\Delta a / 400}\). Note that the raw abilities and ratings will not be used here, only the differences, since they are only defined up to an arbitrary additive offset.
When the two play a game, both their scores are updated according to the outcome. Let \(z\) be the outcome of the match from the point of view of the first player. That is \(z=1\) if the first player wins, \(0\) if they lose, and \(1/2\) in the case of a draw. We update their Elo ratings by
where \(k\) is the \(k\)-factor (typically between 10 and 40), and \(g\) gives the expected value of the outcome based on the difference in ratings, with
Because we add and subtract the same update to both players' ratings, the difference between them gets twice that update, thus the \(2\).
Let \(\epsilon\) be the error in the ratings: \(\Delta r = \Delta a + \epsilon\). Then the error updates as