I wanted a drop-in replacement for geom_errorbar in ggplot2 that would plot a density cloud of uncertainty. The idea is that typically (well, where I work), the ymin and ymax of an errorbar are plotted at plus and minus one standard deviation. A 'cloud' where the alpha is proportional to a normal density with the same standard deviations could show the same information on a plot with a little less clutter. I found out how to do this with a very ugly function, but wanted to do it the 'right' way by spawning my own geom. So the geom_cloud.

After looking at a bunch of other ggplot2 extensions, some amount of tinkering and hair-pulling, and we have the following code. The first part just computes standard deviations which are equally spaced in normal density. This is then used to create a list of geom_ribbon with equal alpha, but the right size. A little trickery is used to get the scales right. There are three parameters: the steps, which control how many ribbons are drawn. The default value is a little conservative. A larger value, like 15, gives very smooth clouds. The se_mult is the number of standard deviations that the ymax and ymin are plotted at, defaulting to 1 here. If you plot your errorbars at 2 standard errors, change this to 2. The max_alpha is the alpha at the maximal density, i.e. around y.

# get points equally spaced in density 
equal_ses <- function(steps) {
    xend <- c(0,4)
    endpnts <- dnorm(xend)
# perhaps use ppoints instead?
    deql <- seq(from=endpnts[1],to=endpnts[2],length.out=steps+1)
    davg <- (deql[-1] + deql[-length(deql)])/2
# invert
    xeql <- unlist(lapply(davg,function(d) {
                     uniroot(f=function(x) { dnorm(x) - d },interval=xend)$root
    }))
    xeql
}

library(ggplot2)
library(grid)

geom_cloud <- function(mapping = NULL, data = NULL, ...,
                                             na.rm = TRUE,
                                             steps = 7, se_mult=1, max_alpha=1,
                                             inherit.aes = TRUE) {

  layer(
    data = data,
    mapping = mapping,
    stat = "identity",
    geom = GeomCloud,
    position = "identity",
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
            steps = steps,
            se_mult = se_mult,
            max_alpha = max_alpha,
      ...
    )
  )
}


GeomCloud <- ggproto("GeomCloud", Geom,
  required_aes = c("x", "y", "ymin", "ymax"),
  non_missing_aes = c("fill"),
  default_aes = aes(
    colour = NA, fill = NA, alpha = 1, size=1, linetype=1
  ),
    setup_data = function(data,params) {
        if (params$na.rm) {
            ok_row <- !(is.na(data$x) | is.na(data$y) | (is.na(data$ymin) & is.na(data$ymax)))
            data <- data[ok_row,]
        }
        ses <- equal_ses(params$steps)
        data$up_se <- (1/params$se_mult) * (data$ymax - data$y)
        data$dn_se <- (1/params$se_mult) * (data$y - data$ymin)
        # a trick to get positions ok: puff up the ymax and ymin for now
        maxse <- max(ses)
        data$ymax  <- data$y + maxse * data$up_se
        data$ymin  <- data$y - maxse * data$dn_se
        data
    },
  draw_group = function(data, panel_scales, coord,
                                                na.rm = TRUE,
                                                steps = 5, se_mult=1, max_alpha=1) {

        data$alpha <- max_alpha / steps
        # 2FIX: use the coordinate transform? or just forget it?
        ses <- equal_ses(steps)
        grobs <- Map(function(myse) {
                                     this_data <- data
                                     this_data$ymax <- this_data$y + myse * this_data$up_se
                                     this_data$ymin <- this_data$y - myse * this_data$dn_se
                                     ggplot2::GeomRibbon$draw_group(this_data, panel_scales, coord, na.rm=na.rm)
                                                },ses)
        do.call("gList",grobs)
  },
    draw_key = draw_key_polygon
)

Ok, now I use it. I construct some data in multiple groups, splitting into column and row facets by some random variable. Within each facet there are two groups. The standard error is proportional to square root x. I then plot the clouds. I put these in square root space to convince myself I had not goofed up scale transformations. (Well, still not sure, really.)

library(dplyr)

nobs <- 1000

set.seed(2134)
mydat <- data.frame(grp=sample(c(0,1),nobs,replace=TRUE),
                                        colfac=sample(letters[1:2],nobs,replace=TRUE),
                                        rowfac=sample(letters[10 + (1:3)],nobs,replace=TRUE)) %>%
    mutate(x=seq(0,1,length.out=nobs) + 0.33 * grp) %>%
    mutate(y=0.25*rnorm(nobs) + 2*grp) %>%
    mutate(grp=factor(grp)) %>%
    mutate(se=sqrt(x)) %>%
    mutate(ymin=y-se,ymax=y+se)

offs <- 2
ph <- mydat %>%
    mutate(y=y+offs,ymin=ymin+offs,ymax=ymax+offs) %>%
    ggplot(aes(x=x,y=y,ymin=ymin,ymax=ymax,color=grp,fill=grp)) + 
    facet_grid(rowfac ~ colfac) +
    scale_y_sqrt() + geom_line() + 
    geom_cloud(aes(fill=grp),steps=15,max_alpha=0.85,color=NA) +   # important!  set color to NA!
    labs(title='geom cloud')
print(ph)

plot of chunk first_plot

At some point I will put this in a package or try to push it off into one of the fine ggplot2 extension packages. There are still some bugs to be worked out: I would like the fill to default to the color if the fill is not given. I want to force the color to be NA, otherwise geom_ribbon adds really ugly lines, and so on.