I turned my kids on to the great Spy vs Spy cartoon from Mad Magazine. This strip is pure gold for two young boys: Rube Goldberg plus explosions with not much dialog (one child is still too young to read). I became curious whether the one Spy had the upper hand, whether Prohias worked to keep the score 'even', and so on.

Not finding any data out there, I collected the data to the best of my ability from the Spy vs Spy Omnibus, which collects all 248 strips that appeared in Mad Magazine (plus two special issues). I think there are more strips out there by Prohias that appeared only in collected books, but have not collected them yet. I entered the data into a google spreadsheet, then converted into CSV, then into an R data package. Now you can play along at home.

On to the simplest form of my question: did Prohias alternate between
Black and White Spy victories? or did he choose at random?
Up until 1968 it was common for two strips to appear in one issue
of Mad, with one victory per Spy. In some cases *three* strips
appeared per issue, with the Grey Spy appearing in the third;
the Black and White Spies always receive a comeuppance when she
appears, and so the balance of power was maintained.
After 1972, it seems that only a single strip appeared per issue,
and we can examine the time series of victories.

```
library(SPYvsSPY)
library(dplyr)
data(svs)
# show that there are multiple per strip
svs %>%
group_by(Mad_no,yrmo) %>%
summarize(nstrips=n(),
net_victories=sum(as.numeric(white_comeuppance) - as.numeric(black_comeuppance))) %>%
ungroup() %>%
select(yrmo,nstrips,net_victories) %>%
head(n=20) %>%
kable()
```

yrmo | nstrips | net_victories |
---|---|---|

1961-01 | 3 | -1 |

1961-03 | 2 | 0 |

1961-04 | 2 | 0 |

1961-06 | 2 | 0 |

1961-07 | 2 | 0 |

1961-09 | 2 | 0 |

1961-12 | 1 | 0 |

1962-03 | 2 | 0 |

1962-04 | 2 | 0 |

1962-06 | 2 | 0 |

1962-07 | 2 | 0 |

1962-09 | 2 | -1 |

1962-10 | 2 | -1 |

1962-12 | 2 | 1 |

1963-03 | 2 | -1 |

1963-04 | 2 | -1 |

1963-06 | 3 | -1 |

1963-09 | 2 | 1 |

1963-10 | 2 | 1 |

1963-12 | 3 | 0 |

Here I plot the 'net black score', the cumulative sum of White comeuppances minus those of Black. Note that when the Grey Spy appears (or in the rare cases where neither Spy seems to suffer, of which I found two), there is no net movement of the score. It seems that the Black Spy was the net loser, suffering most in the 1970's, with a comeback in the 1980's.

```
library(ggplot2)
ph <- svs %>%
mutate(snapdate=as.Date(paste0(yrmo,'-01'),format='%Y-%m-%d'),
black_victory=as.numeric(white_comeuppance) - as.numeric(black_comeuppance)) %>%
group_by(Mad_no,snapdate) %>%
summarize(black_victory=sum(black_victory)) %>%
ungroup() %>%
mutate(black_score=cumsum(black_victory)) %>%
ggplot(aes(snapdate,black_score)) +
geom_line() + geom_point(alpha=0.5) +
labs(title='Spy vs Spy tally',
y='net Y victories',
x='issue date')
ph
```

## Wald Wolfowitz

The Wald Wolfowitz test is a non-parametric test for the presence of serial correlation that is appropriate for binary series like this. The test is performed by computing the number of 'runs', which is to say the number of clusters of consecutive victories by one of the Spies. When the test statistic is too high (compared to what would be observed if the data were serially independent), then the data are too 'flippy', often reversing. This would be the case if Prohias tried to keep score balance by always reversing the previous outcome. If the test statistic is too low, the data are too 'sticky', with long periods of one Spy prevailing over the other. This could happen if Prohias got moody and picked favorites, perhaps.

The test is easy enough to run in R:

```
library(randtests)
subdata <- svs %>%
filter(Mad_no > 152) %>%
mutate(black_victory=as.numeric(white_comeuppance) - as.numeric(black_comeuppance)) %>%
filter(abs(black_victory) > 0.5)
set.seed(1234)
resu <- randtests::runs.test(subdata$black_victory,threshold=0)
print(resu)
```

```
Runs Test
data: subdata$black_victory
statistic = 0.8532, runs = 50, n1 = 44, n2 = 46, n = 90, p-value =
0.394
alternative hypothesis: nonrandomness
```

We get back a test statistic of 0.85, indicating slightly greater than random amount of reversal. However, this is not statistically significantly different than the expected value of 0, with a p-value of 0.39.

In conclusion, we have no evidence that Prohias kept running tally of Black and White Spy victories, and the data are consistent with the victor being chosen independently of the previous victories.