## Asciinema!

Sun 21 February 2016
by

Steven
I recently stumbled on asciinema, which is a way of screencasting
execution of code on the command line. Since the file format is apparently a JSON file,
the storage and transmission requirements of the screencap are fairly minimal. Moreover,
as one is watching playback, one can pause and copy and select text from the screencap.
read more

## Hiring a Data Scientist

Sun 21 February 2016
by

Steven
I recently found myself hiring for the position of data scientist. While I had interviewed
candidates at previous jobs, I am now in a considerably smaller group with a greater role
in the hiring process. Here are a few of my thoughts on the process:
read more

## It's Madness!

Sat 02 January 2016
by

Steven E. Pav
I recently released a package to CRAN called
madness. The eponymous object
supports 'multivariate' automatic
differentiation by forward accumulation. By 'multivariate', I mean it allows
you to track (and automatically computes) the derivative of a scalar, or
vector, or matrix, or multidimensional array with respect to a scalar, vector,
matrix or multidimensional array.

The primary use case in mind is the
multivariate delta method,
where one has an estimate of a population quantity and the variance-covariance
of the same, and wants to perform inference on some transform of that
population quantity. With the values stored in a `madness`

object, one merely
performs the transforms directly on the estimate, and the derivatives are
computed automatically. A secondary use case would be for the automatic
computation of gradients when optimizing some complex function, *e.g.* in the
computation of the MLE of some quantity.

A `madness`

object contains a value, `val`

, as well as the derivative of
`val`

with respect to some \(X\), called `dvdx`

. The derivative is stored
as a *matrix* in 'numerator layout' convention: if `val`

holds
\(m\) values, and \(X\) holds \(n\) values, then `dvdx`

is a \(m \times n\) matrix.
This unfortunately means that a gradient is stored as a *row* vector.
Numerator layout feels more natural (to me, at least) when propagating
derivatives via the chain rule.

For convenience, one can also store the 'tags' of the value and \(X\), in
`vtag`

and `xtag`

, respectively. The `vtag`

will be modified when computations
are performed, which can be useful for debugging. One can also store
the variance-covariance matrix of \(X\) in `varx`

.

Here is an example session showing the use of a `madness`

object. Note that by
default if one does not feed in `dvdx`

, the object constructor assumes that
the value *is equal to* \(X\), and so ...

read more
## Inference on Sorts

Wed 30 December 2015
by

Steven E. Pav
Previously, I described a
model for taste preference appropriate
for some experiments in cocktail design I conducted years ago.
I noted that this model was so elegant and simple, it must have been discovered
previously, and have a rich theory around it. In the two weeks since then,
I discovered a new paper on arxiv about
inference on ranks from comparisons. They review a model much like the one I
outlined, calling it the
Bradley-Terry-Luce
model. (Hey, look, there is indeed a
package on CRAN
for this with a vignette!)

The paper by Shah and Wainright outlines a very simple method for estimating
the top \(k\) of \(n\) participants when the contests include exactly two
participants each. If I am reading it correctly, you take the average number
of observed wins for each contestant, then grab the top \(k\). They prove that
this algorithm is optimal under certain conditions. This seems to me
like an ideal outcome for a research result: the algorithm is dead simple,
and people have likely been using it for years, while the proof is somewhat
intricate. Unfortunately, it does not seem straightforward to generalize
the algorithm to the case where there are covariates, or 'features' about
the various contestants, nor necessarily to the case of multiple contestants
in a given contest. The Bradley-Terry model, on the other hand, is readily
adaptable to these modifications.

read more