## Spy vs Spy vs Wald Wolfowitz.

Tue 05 September 2017
by

Steven E. Pav
I turned my kids on to the great Spy vs Spy cartoon from Mad Magazine.
This strip is pure gold for two young boys: Rube Goldberg plus
explosions with not much dialog (one child is still too young to read).
I became curious whether the one Spy had the upper hand, whether
Prohias worked to keep the score 'even', and so on.

Not finding any data out there, I collected the data to the best
of my ability from the Spy vs Spy Omnibus, which collects all
248 strips that appeared in Mad Magazine (plus two special issues).
I think there are more strips out there by Prohias that appeared
only in collected books, but have not collected them yet.
I entered the data into a google spreadsheet, then converted into
CSV, then into an R data package.
Now you can play along at home.

On to the simplest form of my question: did Prohias alternate between
Black and White Spy victories? or did he choose at random?
Up until 1968 it was common for two strips to appear in one issue
of Mad, with one victory per Spy. In some cases *three* strips
appeared per issue, with the Grey Spy appearing in the third;
the Black and White Spies always receive a comeuppance when she
appears, and so the balance of power was maintained.
After 1972, it seems that only a single strip appeared per issue,
and we can examine the time series of victories.

library(SPYvsSPY)
library(dplyr)
data(svs)
# show that there are multiple per strip
svs %>%
group_by(Mad_no,yrmo) %>%
summarize(nstrips=n(),
net_victories=sum(as.numeric(white_comeuppance) - as.numeric(black_comeuppance))) %>%
ungroup() %>%
select(yrmo,nstrips,net_victories) %>%
head(n=20) %>%
kable()

yrmo |
nstrips |
net_victories |

1961-01 |
3 |
-1 |

1961-03 |
2 |
0 |

1961-04 |
2 |
0 |

1961-06 |
2 |
0 |

1961-07 |
2 … |

read more
## Elo and Draws.

Thu 04 May 2017
by

Steven E. Pav
I still had some nagging thoughts after my recent
examination of the distribution of Elo. In that
blog post, I recognized that a higher probability of a draw would lead
to tighter standard error around the true 'ability' of a player, as
estimated by an Elo ranking. Without any data, I punted on what that
probability should be. So I decided to look at some real data.

I started working in a risk role about a year ago. Compared to my
previous gig, there is a much greater focus on discrete event
modeling than on continuous outcomes. Logistic regression and
survival analysis are the tools of the trade. However,
financial risk modeling is more complex than the textbook
presentation of these methods. As is chess. A loan holder might
go bankrupt, stop paying, die, *etc.* Similarly, a chess player
might win, lose or draw.

There are two main ways of approaching multiple outcome discrete
models that leverage the simpler binary models: the *competing hazards*
view, and the *sequential hazards* view. Briefly, risk under
competing hazards would be like traversing the Fire Swamp: at any time,
the spurting flames, the lightning sand or the rodents of unusual
size might harm you. The risks all come at you at once.
An example of a sequential hazard is undergoing
surgery: you might die in surgery, and if you survive you might incur
an infection and die of complications; the risks present themselves
conditional on surviving other risks. (Both of these
views are mostly just conveniences, and real risks are never so
neatly defined.)

Returning to chess, I will consider sequential hazards.
Assume two players, and let the difference in true abilities between
them be denoted \(\Delta a\).
As with Elo, we want the difference in abilities is such that
the odds that the …

read more
## Distribution of Elo.

Sat 15 April 2017
by

Steven E. Pav
I have been thinking about Elo ratings recently, after
analyzing my tactics ratings. I have a lot of
questions about Elo: is it really predictive of performance? why don't we
calibrate Elo to a quantitative strategy? can we really compare players
across different eras? why not use an extended Kalman Filter instead of
Elo? *etc.* One question I had which I consider here is, "what is the
standard error of Elo?"

Consider two players. Let the difference in true abilities between
them be denoted \(\Delta a\), and let the difference in their
Elo ratings be \(\Delta r\). The difference in abilities is such that
the odds that the first player wins a match between them
is \(10^{\Delta a / 400}\). Note that the raw abilities and ratings
will not be used here, only the differences, since they are only
defined up to an arbitrary additive offset.

When the two play a game, both their scores are updated according
to the outcome. Let \(z\) be the outcome of the match from the
point of view of the first player. That is \(z=1\) if the first player
wins, \(0\) if they lose, and \(1/2\) in the case of a draw. We update
their Elo ratings by

$$
\Delta r \Leftarrow \Delta r + 2 k \left(z - g\left(\Delta r\right) \right),
$$

where \(k\) is the \(k\)-factor (typically between 10 and 40), and \(g\)
gives the expected value of the outcome based on the difference in
ratings, with

$$
g(x) = \frac{10^{x/400}}{1 + 10^{x/400}}.
$$

Because we add and subtract the same update to both players' ratings, the
difference between them gets twice that update, thus the \(2\).

Let \(\epsilon\) be the error in the ratings: \(\Delta r = \Delta a + \epsilon\).
Then the error updates as

$$
\epsilon …

read more
## Chess Tactics.

Thu 30 March 2017
by

Steven E. Pav
I have become more interested in chess in the last year, though I'm still
pretty much crap at it. Rather than play games, I am practicing tactics
at chesstempo. Basically you are presented
with a chess puzzle, which is selected based on your estimated tactical
'Elo' rating, and your rating (and the puzzle's) is adjusted based
on whether you solve it correctly. (Without time limit for standard
problems, though I believe one can also train in 'blitz' mode.)
I decided to look at the data.

I have a few reasons for this exercise:

- To see if I could do it. You cannot easily download your stats
from the site unless pay for gold membership. (I skimped and
bought a silver.) I wanted to practice my web scraping skills,
which I have not exercised in a while.
- To see if the site's rating system made sense as a logistic
regression, and were consistent with the 'standard' definition
of Elo rating.
- To see if I was getting better.
- To see if there was anything simple I could do to improve, like
take longer for problems, or practice certain kinds of problems.
- To look for 'hot hands' phenomenon, which would translate into
autocorrelated residuals.

## The bad and the ugly

Scraping my statistics into a CSV turned out to be fairly straightforward.
The statistics page will
look uninteresting if you are not a member. Even if you are, the data
themselves are served via JavaScript, not in raw HTML. While this
could in theory be solved via, say, phantomJS,
I opted to work with the developer console in Chrome directly.

First go to your statistics page in Chrome.
Then conjure the developer console by pressing `<CTRL>-<SHIFT>-I`

.
A frame should appear.
Click on the 'Console' tab, then type in it:
`copy(document.body …`

read more