CRAN check like a bot with docker.
Tue 08 March 2016
If you're like me, you just blindly check boxes when submitting packages to CRAN. (The
'submit' button should be labeled 'yolo' as far as I'm concerned.) After getting
burned yet again for not actually checking my package with the development build
of R, I decided to be slightly less stupid in the future. Rather than install
R-devel, I made a docker base image
for CRAN checking.
As an example, to check my sadists package,
I made essentially the following Dockerfile:
MAINTAINER Steven E. Pav, firstname.lastname@example.org
# tweak this to force re-install
ENV DOCKER_INSTALL_NONCE 97c22800_9f88_4830_806a_2614e06600f2
# rinstall somethings...
RUN /usr/local/bin/install2.r PDQutils hypergeo orthopolynom shiny testthat ggplot2 xtable knitr
FROM the crancheck image on docker hub. The general recipe would be to install any
system packages via
apt-get, then any CRAN packages via
install2.r, then any github packages
/usr/local/bin/installGithub.r. The base image 'does the right thing' with respect to the
entrypoint and you give the package file as the command.
I built it via:
docker build --rm -t shabbychef/sadists-crancheck docker/
Once the image is built, checking a package is as 'simple' as attaching the local directory
/srv in the container via a volume, and giving the name of the package file. (That is,
when the command to the container is
sadists_0.2.2.5000.tar.gz, it will try to check, as CRAN,
/srv/sadists_0.2.2.5000.tar.gz. You had better make sure it is available there,
so attach this directory here containing the package to
/srv in the container.)
In summary, run it like this:
docker run -it --rm --volume $(pwd):/srv:ro shabbychef/sadists-crancheck sadists_0.2.2.5000.tar.gz
You get output as follows:
Sun 21 February 2016
I recently stumbled on asciinema, which is a way of screencasting
execution of code on the command line. Since the file format is apparently a JSON file,
the storage and transmission requirements of the screencap are fairly minimal. Moreover,
as one is watching playback, one can pause and copy and select text from the screencap.
I feel like this is an amazing technology which has been missing all my life, but to
be honest I am not sure how I should use it yet. For now, I am playing the recursive
gambit of screen-capping the writing and publication of this blog post.
Here's the embed:
Hiring a Data Scientist
Sun 21 February 2016
I recently found myself hiring for the position of data scientist. While I had interviewed
candidates at previous jobs, I am now in a considerably smaller group with a greater role
in the hiring process. Here are a few of my thoughts on the process:
we are reading this.
We read all the resumes sent to us, and all the cover letters (of which there were not enough).
In fact, nearly all the resumes were read by two of us. Perhaps this is not the case at larger
firms who receive hundreds of resumes for a job (or is that a myth?), but we were eagerly
looking for the right candidate, which meant actively researching candidates. Unfortunately
some people treat job applications like lottery tickets: an attempt to net a low probability
large payoff with minimal investment. Like the lottery, you probably have to apply scattershot
to hundreds of jobs to win.
This kind of lottery-ticket application is easy to spot, as no perceptible effort has been
applied. A job application without a cover letter, even a few sentences, feels wrong. It's like
sitting at a bar and someone tries to pick you up by showing you their car keys and class ring
without talking to you. While the cover letter is nominally your chance to personalize your
application, it should be sincere, even at the cost of brevity. Continuing the analogy, it
shouldn't sound like a pickup line.
blah blah Ginger blah blah blah Ginger
One of the candidates tailored their resume for us, emboldening those skills which we requested in
the job posting: Python blah blah blah, MySQL blah blah.
I felt a tiny bit manipulated when I realized they had done this, but it made
it so easy to see that they matched the minimum qualifications in …
Sat 02 January 2016
by Steven E. Pav
I recently released a package to CRAN called
madness. The eponymous object
supports 'multivariate' automatic
differentiation by forward accumulation. By 'multivariate', I mean it allows
you to track (and automatically computes) the derivative of a scalar, or
vector, or matrix, or multidimensional array with respect to a scalar, vector,
matrix or multidimensional array.
The primary use case in mind is the
multivariate delta method,
where one has an estimate of a population quantity and the variance-covariance
of the same, and wants to perform inference on some transform of that
population quantity. With the values stored in a
madness object, one merely
performs the transforms directly on the estimate, and the derivatives are
computed automatically. A secondary use case would be for the automatic
computation of gradients when optimizing some complex function, e.g. in the
computation of the MLE of some quantity.
madness object contains a value,
val, as well as the derivative of
val with respect to some \(X\), called
dvdx. The derivative is stored
as a matrix in 'numerator layout' convention: if
\(m\) values, and \(X\) holds \(n\) values, then
dvdx is a \(m \times n\) matrix.
This unfortunately means that a gradient is stored as a row vector.
Numerator layout feels more natural (to me, at least) when propagating
derivatives via the chain rule.
For convenience, one can also store the 'tags' of the value and \(X\), in
xtag, respectively. The
vtag will be modified when computations
are performed, which can be useful for debugging. One can also store
the variance-covariance matrix of \(X\) in
Here is an example session showing the use of a
madness object. Note that by
default if one does not feed in
dvdx, the object constructor assumes that
the value is equal to \(X\), and so …