I've started playing a variant of chess called Atomic. The pieces move like traditional chess, and start in the same position. In this variant, however, when a piece takes another piece, both are removed from the board, as well as any non-pawn pieces on the (up to eight) adjacent squares. As a consequence of this one change, the game can end if your King is 'blown up' by your opponent's capture. As another consequence, Kings cannot capture, and may occupy adjacent squares.

For example, from the following position White's Knight can blow up the pawns at either d7 or f7, blowing up the Black King and ending the game.

plot of chunk blowup

I looked around for some resources on Atomic chess, but have never had luck with traditional chess studies. Instead I decided to learn about Atomic statistically.

As it happens, Lichess (which is truly a great site) publishes their game data which includes over 9 million Atomic games played. I wrote some code that will download and parse this data, turning it into a CSV file. You can download v1 of this file, but Lichess is the ultimate copyright holder.

First steps

The games in the dataset end in one of three conditions: Normal (checkmate or what passes for it in Atomic), Time forfeit, and Abandoned (game terminated before it began). The last category is very rare, and I omit these from my processing. The majority of games end in the Normal way, as tabulated here:

termination n
Normal 8426052
Time forfeit 1257295

The game data includes Elo scores for players, as computed prior to the game. As a first check, I wanted to see if Elo is properly calibrated. To do this, I compute the empirical win rate of White over Black, grouped by bins of the difference in their Elo (again expressed from White's point of view). To avoid cases where the player is new to the system and their Elo has not yet converged to a good estimate, I select only matches where each player has already recorded at least 50 games in the database when the game is played. I also select for Normal terminations, since these are unambiguous victories. The remaining set contains 'only' 4692631 games.

Here is a plot of empirical log odds, versus Elo difference, in bins 25 points wide. This is in natural log odds space, so we expect the line to have slope of \(\operatorname{log}(10) / 400\) which is around 0.006. While the empirical data is nearly linear in Elo difference, it rides somewhat above the theoretical line. I believe this is due to a tempo advantage to White. If you want a boost, play White.

plot of chunk calibration

In competitive traditional chess, draws are quite common. In fact, the majority of championship games end in a draw. In a previous blog post I looked at calibrating the absolute Elo to the probability of a draw. This would prevent drift over time, meaning one could compare players from different generations. I was curious if a similar effect is seen in Atomic. Here is the ratio of odds of a tie, as a function of the average Elo of the two players. The log odds are mostly linear in average Elo, except for a curious dip around 2200. Perhaps many of these games end in a Time forfeit, which I exclude here.

plot of chunk tie_prob

Openings

My goal is to find good openings. However, there is a confounding variable of player ability. That is, a certain opening might look good only because good players tend to play it. (Perhaps the defining characteristic of good players is that they play good moves!) In traditional chess, a grand master could defeat me from any starting position, but in competition amongst themselves they most certainly avoid certain openings. So I wish to control for player ability, defined as the difference in Elo prior to the game.

First let's look at the distribution of White's opening move. Here are bar plots of the most frequent moves, grouped by White's Elo.
Here I am only filtering on White's Elo and number of games in the database, and consider all games, whether they terminate normally or by time forfeit. We see that indeed, there are differences in choice of opening: higher Elo players play g1f3 and g1h3 more often and e2e4 and e2e3 less often than lower Elo players. That said, these appear to be the four most common openings across all players.

plot of chunk openings_plot

The Elo score is defined so that the log odds of victory from White's perspective is 10 to the Elo difference (White's minus Black's) divided by 400. The 400 is so that the units are palatable to humans. So if \(p\) is the probability that White will win a game, we should have

$$ \left(\frac{p}{1-p}\right) = 10^\frac{\Delta e}{400}. $$

A statistician would view this as a logistic regression. Taking the natural log of both sides we have

$$ \operatorname{log}\left(\frac{p}{1-p}\right) = \frac{\operatorname{log}(10)}{400}\Delta e. $$

To the right hand side we are going to add terms. First as a warm-up, we estimate White's tempo advantage by fitting to the specification

$$ \operatorname{log}\left(\frac{p}{1-p}\right) = \frac{\operatorname{log}(10)}{400}\Delta e + b, $$

for some unknown \(b\). If instead we write this as

$$ \operatorname{log}\left(\frac{p}{1-p}\right) = \frac{\operatorname{log}(10)}{400}\Delta e + \frac{\operatorname{log}(10)}{400}c, $$

then the constant \(c\) is in 'units' of Elo.

I perform this regression on our data, removing Ties, players with few games and low Elo. Here is a table of the regression coefficients from the logistic regression, along with standard errors, the Wald statistics and p-values. I also convert the coefficient estimate into Elo equivalent units. White's advantage is equivalent to around 74 extra Elo points!

term estimate std.error statistic p.value Elo equivalent
(Intercept) 0.425 0.002 270 0 74
Elo 0.006 0.000 675 0 1

Logistic analysis of opening moves

Of course, this is only if White does not squander their tempo. Some opening moves are likely to result in a higher boost to White than this average value, and some lower. I will now fit a model of the form

$$ \operatorname{log}\left(\frac{p}{1-p}\right) = \frac{\operatorname{log}(10)}{400}\left[\Delta e + c_{g1f3} 1_{g1f3} + c_{e2e3} 1_{e2e3} + c_{g1h3} 1_{g1h3} + \ldots \right], $$

where \(1_{g1f3}\) is a 0/1 variable for whether White played g1f3 as their first move, and \(c_{g1f3}\) is the coefficient which we will fit by logistic regression. By rescaling by the magic number \(\operatorname{log}(10) / 400\), the coefficients \(c\) are denominated in 'Elo units'. Here is a table from that fit:

term estimate std.error statistic p.value Elo equivalent
Elo 0.006 0.000 671.6 0 1
g1h3 0.518 0.004 116.0 0 90
g1f3 0.458 0.002 214.2 0 80
e2e3 0.361 0.004 100.9 0 63
e2e4 0.355 0.007 54.4 0 62
d2d4 0.352 0.009 40.3 0 61
b1c3 0.346 0.008 45.5 0 60
Other 0.092 0.010 8.8 0 16

I also perform a multiple comparisons procedure on the opening move coefficients using Tukey's procedure to put them into equivalent groups. Here is a plot of the estimated coefficients, with error bars, colored by group. Again the coefficients are in Elo units. We see that g1h3 has a coefficient of around 90, while g1f3 is around 80. The four other openings considered are nearly equivalent, and all other moves are far inferior.

plot of chunk reg_two_plot

Second moves

This seems odd, however, because g1f3 is played much more often than g1h3, even by high Elo players. Moreover, just as White's apparent tempo advantage is the average value over a bunch of possible moves, some giving better results and some worse, so too is the boost from playing g1h3. Perhaps Black has a good refutation of this move, but it is not well known, and the effect we see here is an average over all of Black's replies.

So now I perform an analysis of Black's replies, conditional on White's opening. I filter for White's move, then perform a logistic regression, only considering the most frequent replies. Here is a table of the regression coefficients for the three best looking opening moves for White. Because the equations are still from White's point of view, Black would choose to play the move with the lowest Elo equivalent. Below we plot the coefficient estimates of Black's replies, again grouping statistically non-distinguishable coefficient estimates.

move1 move2 estimate std.error statistic p.value Elo equivalent
e2e3 Elo 0.006 0.000 286.54 0 1.0
e2e3 d7d6 0.311 0.021 14.55 0 54.0
e2e3 e7e6 0.317 0.004 74.83 0 55.0
e2e3 d7d5 0.334 0.022 15.01 0 58.0
e2e3 g8f6 0.432 0.011 40.12 0 75.0
e2e3 e7e5 0.592 0.022 26.55 0 100.0
e2e3 Other 0.763 0.016 48.37 0 130.0
g1f3 Elo 0.006 0.000 490.63 0 1.0
g1f3 e7e5 0.135 0.007 19.25 0 24.0
g1f3 d7d6 0.145 0.017 8.46 0 25.0
g1f3 f7f6 0.483 0.002 211.38 0 84.0
g1f3 g8f6 0.834 0.048 17.50 0 140.0
g1f3 e7e6 1.458 0.047 30.81 0 250.0
g1f3 Other 2.069 0.042 48.84 0 360.0
g1h3 Elo 0.006 0.000 231.93 0 1.1
g1h3 e7e6 0.092 0.022 4.23 0 16.0
g1h3 e7e5 0.155 0.019 8.21 0 27.0
g1h3 f7f6 0.503 0.010 52.54 0 87.0
g1h3 h7h6 0.536 0.006 93.12 0 93.0
g1h3 Other 0.554 0.024 22.75 0 96.0
g1h3 g7g5 1.837 0.037 49.63 0 320.0

plot of chunk reg_three_plot

We note that the apparent best reply to g1f3 is e7e5, which yields an advantage to White of 24 Elo points; the best reply to e2e3 is d7d6, which is worth 54 Elo points to White; and the best reply to g1h3 is e7e6, which yields 16 Elo points. While our analysis of White's openings indicated that g1h3 is the best opening, this appears to be due to suboptimal replies from Black. To maximize the minimum reply from Black, White should apparently play e2e3.

I would take this analysis with a grain of salt. The response d7d6 to White's g1f3 looks very weak, with White taking Black's Queen and leaving Black's King very exposed. This suggests something very fishy with potential for big differences in optimal and average play in the responses.

Of course, there is also a horizon problem here. Just as the coefficient fits average over suboptimal play in Black's reply, our two level analysis ignores White's second move. In a future blog post I hope to perform a proper minimax analysis of openings, truncating the analysis when the data no longer supports different coefficient fits among the various possible moves, then minimaxing the values up. I also plan on performing a material value analysis, which is quite simple from the data available here. My early exploration suggests that the usual 1:3:3:5:9 point value system of Pawn, Knight, Bishop, Rook, Queen in traditional chess is not well calibrated to Atomic chess. Finally, I hope to catalog the many fool's mates and their frequency.