Distribution of Elo.

Sat 15 April 2017 by Steven E. Pav

I have been thinking about Elo ratings recently, after analyzing my tactics ratings. I have a lot of questions about Elo: is it really predictive of performance? why don't we calibrate Elo to a quantitative strategy? can we really compare players across different eras? why not use an extended Kalman Filter instead of Elo? etc. One question I had which I consider here is, "what is the standard error of Elo?"

Consider two players. Let the difference in true abilities between them be denoted \(\Delta a\), and let the difference in their Elo ratings be \(\Delta r\). The difference in abilities is such that the odds that the first player wins a match between them is \(10^{\Delta a / 400}\). Note that the raw abilities and ratings will not be used here, only the differences, since they are only defined up to an arbitrary additive offset.

When the two play a game, both their scores are updated according to the outcome. Let \(z\) be the outcome of the match from the point of view of the first player. That is \(z=1\) if the first player wins, \(0\) if they lose, and \(1/2\) in the case of a draw. We update their Elo ratings by

$$ \Delta r \Leftarrow \Delta r + 2 k \left(z - g\left(\Delta r\right) \right), $$

where \(k\) is the \(k\)-factor (typically between 10 and 40), and \(g\) gives the expected value of the outcome based on the difference in ratings, with

$$ g(x) = \frac{10^{x/400}}{1 + 10^{x/400}}. $$

Because we add and subtract the same update to both players' ratings, the difference between them gets twice that update, thus the \(2\).

Let \(\epsilon\) be the error in the ratings: \(\Delta r = \Delta a + \epsilon\). Then the error updates as

$$ \epsilon …
read more

Chess Tactics.

Thu 30 March 2017 by Steven E. Pav

I have become more interested in chess in the last year, though I'm still pretty much crap at it. Rather than play games, I am practicing tactics at chesstempo. Basically you are presented with a chess puzzle, which is selected based on your estimated tactical 'Elo' rating, and your rating (and the puzzle's) is adjusted based on whether you solve it correctly. (Without time limit for standard problems, though I believe one can also train in 'blitz' mode.) I decided to look at the data.

I have a few reasons for this exercise:

  1. To see if I could do it. You cannot easily download your stats from the site unless pay for gold membership. (I skimped and bought a silver.) I wanted to practice my web scraping skills, which I have not exercised in a while.
  2. To see if the site's rating system made sense as a logistic regression, and were consistent with the 'standard' definition of Elo rating.
  3. To see if I was getting better.
  4. To see if there was anything simple I could do to improve, like take longer for problems, or practice certain kinds of problems.
  5. To look for 'hot hands' phenomenon, which would translate into autocorrelated residuals.

The bad and the ugly

Scraping my statistics into a CSV turned out to be fairly straightforward. The statistics page will look uninteresting if you are not a member. Even if you are, the data themselves are served via JavaScript, not in raw HTML. While this could in theory be solved via, say, phantomJS, I opted to work with the developer console in Chrome directly.

First go to your statistics page in Chrome. Then conjure the developer console by pressing <CTRL>-<SHIFT>-I. A frame should appear. Click on the 'Console' tab, then type in it: copy(document.body …

read more

Another Thalesians Talk

Tue 14 March 2017 by Steven

Matt Dixon 40th Birthday Talk

read more

Lego Pricing.

Mon 06 March 2017 by Steven E. Pav

It is time to get kiddo a new Lego set, as he's been on a bender this week, building everything he can get his hands on. I wanted to optimize play time per dollar spent, so I set out to look for Lego pricing data.

Not surprisingly, there are a number of good sources for this data. The best I found was at brickset. Sign up for an account, then go to their query builder. I built a query requesting all sets from 2011 onwards, then selected the CSV option, copied the data into my clipboard, then dumped it via xclip -o > brickset_db.csv. The brickset data is updated over time, so there's no reason to prefer my file to one you download yourself.

First I load the data in R, filter based on availability of Piece and Price data, then remove certain themes (Books, Duplo, and so on). I then subselect themes based on having a large range of prices and of number of pieces:

indat <- readr::read_csv('../data/brickset_db.csv') %>%
## Parsed with column specification:
## cols(
##   SetID = col_double(),
##   Number = col_character(),
##   Variant = col_double(),
##   Theme = col_character(),
##   Subtheme = col_character(),
##   Year = col_double(),
##   Name = col_character(),
##   Minifigs = col_double(),
##   Pieces = col_double(),
##   UKPrice = col_double(),
##   USPrice = col_double(),
##   CAPrice = col_double(),
##   EUPrice = col_double(),
##   ImageURL = col_character(),
##   Owned = col_character(),
##   Wanted = col_character(),
##   QtyOwned = col_double(),
##   Rating = col_logical()
## )
subdat <- indat %>%
    filter(!,Pieces >= 10,
                 !,USPrice > 1,
                 !grepl('^(Books|Mindstorms|Duplo|.+Minifigures|Power Func|Games|Education|Serious)',Theme)) 

subok <- subdat %>%
    group_by(Theme) %>%
        summarize(many_sets=(sum(!is.nan(USPrice)) >= 10),
                     piece_spread=((max(Pieces) / min(Pieces)) >= 5),
                     price_spread=((max(USPrice) / min(USPrice)) >= 4)) %>%
    ungroup() %>%
    filter(many_sets & piece_spread & price_spread) %>% 

subdat <- subdat %>%

subdat %>% sample_n(10) %>% 
                caption='Random …
read more