## geom cloud.

Thu 21 September 2017
by

Steven E. Pav
I wanted a drop-in replacement for `geom_errorbar`

in `ggplot2`

that would
plot a density cloud of uncertainty.
The idea is that typically (well, where I work),
the `ymin`

and `ymax`

of an errorbar are plotted at plus and minus
one standard deviation. A 'cloud' where the alpha is proportional to a normal
density with the same standard deviations could show the same information
on a plot with a little less clutter. I found out how to do this with
a very ugly function, but wanted to do it the 'right' way by spawning my
own geom. So the `geom_cloud`

.

After looking at a bunch of other `ggplot2`

extensions, some amount of
tinkering and hair-pulling, and we have the following code. The first part
just computes standard deviations which are equally spaced in normal density.
This is then used to create a list of `geom_ribbon`

with equal alpha, but
the right size. A little trickery is used to get the scales right. There
are three parameters: the `steps`

, which control how many ribbons are drawn.
The default value is a little conservative. A larger value, like 15, gives
very smooth clouds. The `se_mult`

is the number of standard deviations that
the `ymax`

and `ymin`

are plotted at, defaulting to 1 here. If you plot
your errorbars at 2 standard errors, change this to 2. The `max_alpha`

is the
alpha at the maximal density, *i.e.* around `y`

.

# get points equally spaced in density
equal_ses <- function(steps) {
xend <- c(0,4)
endpnts <- dnorm(xend)
# perhaps use ppoints instead?
deql <- seq(from=endpnts[1],to=endpnts[2],length.out=steps+1)
davg <- (deql[-1] + deql[-length(deql)])/2
# invert
xeql <- unlist(lapply(davg,function(d) {
uniroot(f=function(x) { dnorm(x) - d },interval=xend)$root
}))
xeql
}
library(ggplot2)
library(grid)
geom_cloud <- function(mapping …

read more
## Spy vs Spy vs Wald Wolfowitz.

Tue 05 September 2017
by

Steven E. Pav
I turned my kids on to the great Spy vs Spy cartoon from Mad Magazine.
This strip is pure gold for two young boys: Rube Goldberg plus
explosions with not much dialog (one child is still too young to read).
I became curious whether the one Spy had the upper hand, whether
Prohias worked to keep the score 'even', and so on.

Not finding any data out there, I collected the data to the best
of my ability from the Spy vs Spy Omnibus, which collects all
248 strips that appeared in Mad Magazine (plus two special issues).
I think there are more strips out there by Prohias that appeared
only in collected books, but have not collected them yet.
I entered the data into a google spreadsheet, then converted into
CSV, then into an R data package.
Now you can play along at home.

On to the simplest form of my question: did Prohias alternate between
Black and White Spy victories? or did he choose at random?
Up until 1968 it was common for two strips to appear in one issue
of Mad, with one victory per Spy. In some cases *three* strips
appeared per issue, with the Grey Spy appearing in the third;
the Black and White Spies always receive a comeuppance when she
appears, and so the balance of power was maintained.
After 1972, it seems that only a single strip appeared per issue,
and we can examine the time series of victories.

library(SPYvsSPY)
library(dplyr)
data(svs)
# show that there are multiple per strip
svs %>%
group_by(Mad_no,yrmo) %>%
summarize(nstrips=n(),
net_victories=sum(as.numeric(white_comeuppance) - as.numeric(black_comeuppance))) %>%
ungroup() %>%
select(yrmo,nstrips,net_victories) %>%
head(n=20) %>%
kable()

## `summarise()` has grouped output by 'Mad_no'. You can override using the `.groups` argument.

yrmo |
nstrips |
net_victories |

1961-01 … |

read more
## R in Finance 2017

Fri 19 May 2017
by

Steven
Review of R in Finance 2017 conference

read more
## Calendar plots in ggplot2.

Thu 18 May 2017
by

Steven E. Pav
I like the calendar 'heatmap' plots of commits you can see on
github user pages, and wanted to play around with some.
Of course, if I just wanted to make some plots, I could have just googled around, and then
followed this recipe,
or maybe used the rChartsCalmap package.
Instead I set out, as an exercise, to make my own using ggplot2.

For data, I am using the daily GHCND observations data for station `USC00047880`

, which is
located in the San Rafael, CA, Civic Center. I downloaded this data as part of a project
to join weather data to campground data (yes, it's been done before), directly from
the NOAA FTP site, then read the fixed width
file. I then processed the data, subselected to 2016 and beyond, and converted the units.
I am left with a dataframe of dates, the element name, and the value, which is a temperature
in Celsius. The first ten values I show here:

date |
element |
value |

2016-01-01 |
TMAX |
9.4 |

2016-01-01 |
TMIN |
0.0 |

2016-01-02 |
TMAX |
10.0 |

2016-01-02 |
TMIN |
3.9 |

2016-01-03 |
TMAX |
11.7 |

2016-01-03 |
TMIN |
6.7 |

2016-01-04 |
TMAX |
12.8 |

2016-01-04 |
TMIN |
6.7 |

2016-01-05 |
TMAX |
12.8 |

2016-01-05 |
TMIN |
8.3 |

Here is the code to produce the heatmap itself. I first use the `date`

field
to compute the x axis labels and locations: the dates are converted essentially
to 'Julian' days since January 4, 1970 (a Sunday), then divided by seven to
get a 'Julian' week number. The week number containing the tenth of the month is
then set as the location of the month name in the x axis labels. I add years to
the January labels.

I then compute the Julian week number and day number of the week. I create a variable
which alternates between …

read more