Gilgamath



Atomic Openings

Fri 07 May 2021 by Steven E. Pav

I've started playing a variant of chess called Atomic. The pieces move like traditional chess, and start in the same position. In this variant, however, when a piece takes another piece, both are removed from the board, as well as any non-pawn pieces on the (up to eight) adjacent squares. As a consequence of this one change, the game can end if your King is 'blown up' by your opponent's capture. As another consequence, Kings cannot capture, and may occupy adjacent squares.

For example, from the following position White's Knight can blow up the pawns at either d7 or f7, blowing up the Black King and ending the game.

plot of chunk blowup

I looked around for some resources on Atomic chess, but have never had luck with traditional chess studies. Instead I decided to learn about Atomic statistically.

As it happens, Lichess (which is truly a great site) publishes their game data which includes over 9 million Atomic games played. I wrote some code that will download and parse this data, turning it into a CSV file. You can download v1 of this file, but Lichess is the ultimate copyright holder.

First steps

The games in the dataset end in one of three conditions: Normal (checkmate or what passes for it in Atomic), Time forfeit, and Abandoned (game terminated before it began). The last category is very rare, and I omit these from my processing. The majority of games end in the Normal way, as tabulated here:

termination n
Normal 8426052
Time forfeit 1257295

The game data includes Elo scores for players, as computed prior to the game. As a first check, I wanted to see if Elo is properly calibrated. To do this, I compute the empirical win rate of White over Black, grouped by bins of the difference in their Elo …

read more

Spy vs Spy vs Wald Wolfowitz.

Tue 05 September 2017 by Steven E. Pav

I turned my kids on to the great Spy vs Spy cartoon from Mad Magazine. This strip is pure gold for two young boys: Rube Goldberg plus explosions with not much dialog (one child is still too young to read). I became curious whether the one Spy had the upper hand, whether Prohias worked to keep the score 'even', and so on.

Not finding any data out there, I collected the data to the best of my ability from the Spy vs Spy Omnibus, which collects all 248 strips that appeared in Mad Magazine (plus two special issues). I think there are more strips out there by Prohias that appeared only in collected books, but have not collected them yet. I entered the data into a google spreadsheet, then converted into CSV, then into an R data package. Now you can play along at home.

On to the simplest form of my question: did Prohias alternate between Black and White Spy victories? or did he choose at random? Up until 1968 it was common for two strips to appear in one issue of Mad, with one victory per Spy. In some cases three strips appeared per issue, with the Grey Spy appearing in the third; the Black and White Spies always receive a comeuppance when she appears, and so the balance of power was maintained. After 1972, it seems that only a single strip appeared per issue, and we can examine the time series of victories.

library(SPYvsSPY)
library(dplyr)
data(svs)

# show that there are multiple per strip
svs %>%
    group_by(Mad_no,yrmo) %>%
        summarize(nstrips=n(),
                            net_victories=sum(as.numeric(white_comeuppance) - as.numeric(black_comeuppance))) %>%
    ungroup() %>%
    select(yrmo,nstrips,net_victories) %>%
    head(n=20) %>%
    kable()
## `summarise()` has grouped output by 'Mad_no'. You can override using the `.groups` argument.
yrmo nstrips net_victories
1961-01 …
read more

Elo and Draws.

Thu 04 May 2017 by Steven E. Pav

I still had some nagging thoughts after my recent examination of the distribution of Elo. In that blog post, I recognized that a higher probability of a draw would lead to tighter standard error around the true 'ability' of a player, as estimated by an Elo ranking. Without any data, I punted on what that probability should be. So I decided to look at some real data.

I started working in a risk role about a year ago. Compared to my previous gig, there is a much greater focus on discrete event modeling than on continuous outcomes. Logistic regression and survival analysis are the tools of the trade. However, financial risk modeling is more complex than the textbook presentation of these methods. As is chess. A loan holder might go bankrupt, stop paying, die, etc. Similarly, a chess player might win, lose or draw.

There are two main ways of approaching multiple outcome discrete models that leverage the simpler binary models: the competing hazards view, and the sequential hazards view. Briefly, risk under competing hazards would be like traversing the Fire Swamp: at any time, the spurting flames, the lightning sand or the rodents of unusual size might harm you. The risks all come at you at once. An example of a sequential hazard is undergoing surgery: you might die in surgery, and if you survive you might incur an infection and die of complications; the risks present themselves conditional on surviving other risks. (Both of these views are mostly just conveniences, and real risks are never so neatly defined.)

Returning to chess, I will consider sequential hazards. Assume two players, and let the difference in true abilities between them be denoted \(\Delta a\). As with Elo, we want the difference in abilities is such that the odds that the …

read more

Distribution of Elo.

Sat 15 April 2017 by Steven E. Pav

I have been thinking about Elo ratings recently, after analyzing my tactics ratings. I have a lot of questions about Elo: is it really predictive of performance? why don't we calibrate Elo to a quantitative strategy? can we really compare players across different eras? why not use an extended Kalman Filter instead of Elo? etc. One question I had which I consider here is, "what is the standard error of Elo?"

Consider two players. Let the difference in true abilities between them be denoted \(\Delta a\), and let the difference in their Elo ratings be \(\Delta r\). The difference in abilities is such that the odds that the first player wins a match between them is \(10^{\Delta a / 400}\). Note that the raw abilities and ratings will not be used here, only the differences, since they are only defined up to an arbitrary additive offset.

When the two play a game, both their scores are updated according to the outcome. Let \(z\) be the outcome of the match from the point of view of the first player. That is \(z=1\) if the first player wins, \(0\) if they lose, and \(1/2\) in the case of a draw. We update their Elo ratings by

$$ \Delta r \Leftarrow \Delta r + 2 k \left(z - g\left(\Delta r\right) \right), $$

where \(k\) is the \(k\)-factor (typically between 10 and 40), and \(g\) gives the expected value of the outcome based on the difference in ratings, with

$$ g(x) = \frac{10^{x/400}}{1 + 10^{x/400}}. $$

Because we add and subtract the same update to both players' ratings, the difference between them gets twice that update, thus the \(2\).

Let \(\epsilon\) be the error in the ratings: \(\Delta r = \Delta a + \epsilon\). Then the error updates as

$$ \epsilon …
read more