UseR 2016

Mon 11 July 2016 by Steven

I attended some parts of UseR 2016 at Stanford in late June. I have not been to a large conference such as this in a long time and had forgotten how much I dislike them: parallel sessions in far-flung locations with somewhat inconsistent lineups, 'overflow' areas where you can watch a speaker on broadcast, and a general lack of cohesion where you share very little experience with others. In addition my work schedule prevented me from attending either the tutorials or the last (half) day. My review will be pretty spotty. Luckily you can somehow watch many of the talks thanks to, uh, Microsoft.

Richard Becker's talk on forty years of S was a great oral history of the project. Richard is well spoken, the history is interesting (to me at least), and the low quality scans of personnel pictures added to the throwback feeling of it.

One of the main themes of the talks I attended is that there is a large handful of ways to scale R up to the next level of data via parallelization of some kind. By 'next level', I mean data too large to fit in memory on your typical dev machine, say bigger than 64GB, but not so large that you really should have a professional helping you, say smaller than 1PB. Maybe this is officially 'medium data'? I find myself working in this range these days and am really lost. Getting some leads on this problem was one of my main goals at the conference, and unfortunately, I walked away empty-handed.

The ddR talk seemed somewhat compelling. I am not sure how I could use it, however, given the highly constrained IT environment that I work within. Though I suppose that proviso applies to all the solutions discussed, except the 'enterprise grade' entries, whose cost justifies their worth. I even had a delusional dream that I could call (distributed) SAS as a front-end to a ddR object.

The flashR talk presents another somewhat odd solution to parallelization. Here the idea, I think, is to use lots of flash storage as a quasi-memory to expand R's computing capability. Since this is a hardware solution as well as a software overlay, there is no way I was going to be able to use it, and I might have dozed off.

I should note that Palo Alto is somewhat warmer than my home in San Francisco, only 30 miles to the north and 15 degrees Farenheit cooler. On day 1, the afternoon snack included yogurt pretzels. A bowl of these in the sun formed a soupy mess.

There was another parallelization talk which was an all-hardware solution, again not really relevant to me. One of the big reveals at the conference from Rstudio is sparklyr (or is it dsparklyr?), a dplyr-like frontend for Apache Spark. This is the probably closest to satisfying my current processing needs, except it relies on Spark, which I am not going to be able to install and run in my current environment. Sad.

I sat through much of a session on regressions: I attended the mumm and TMB talk. My takeaway from this is that there is an interesting model 'template builder' in C++ which I do not properly understand, and which can assist with fast MLE estimation, which sounds like something I need. The bigKRLS talk mostly served as a warning to me not to use KRLS, since it is apparently has terrible memory needs. The talk on fast additive quantile regression is now a bit of a blur to me. Something about 'pinball loss.'

Don Knuth's talk on literate programming was a real joy. I have to admit, I was star-struck. Don reinforced that TeX is a beauty to read, perhaps more so than to write. In the Q and A, someone asked if he had used Markdown, which is pretty gutsy. On the other hand, I was surprised to find that he apparently writes HTML by hand instead of having written his own system of rendering to HTML, say.

I attended Pete Baker's talk on using Make with R, a topic relevant to my interests. I had hoped there would be more advanced usage tips in the talk, though, like how to wrangle around non-file dependencies and work on remote machines.

I reprised my talk (20 minutes in or so) on madness, the multivariate automatic differentiation package I wrote. The lightning talks were on autopilot, with 20 seconds per slide, which is about as unpleasant as it sounds. I think I got through to one or two people, so it was worth it.

I kind of crashed Gabor Csardi's talk on tools for packages. The takeways from this talk are: don't receive desktop notifications from twitter, and use rcmdcheck package to CRAN check your package like a boss.

The talk that everyone else attended, but I missed, was Jan Vitek's talk on what R could learn from Julia.

Gilgamath

UseR 2016

Comments