gilgamath


Hiring a Data Scientist

Sun 21 February 2016 by Steven

I recently found myself hiring for the position of data scientist. While I had interviewed candidates at previous jobs, I am now in a considerably smaller group with a greater role in the hiring process. Here are a few of my thoughts on the process:

we are reading this.

We read all the resumes sent to us, and all the cover letters (of which there were not enough). In fact, nearly all the resumes were read by two of us. Perhaps this is not the case at larger firms who receive hundreds of resumes for a job (or is that a myth?), but we were eagerly looking for the right candidate, which meant actively researching candidates. Unfortunately some people treat job applications like lottery tickets: an attempt to net a low probability large payoff with minimal investment. Like the lottery, you probably have to apply scattershot to hundreds of jobs to win.

This kind of lottery-ticket application is easy to spot, as no perceptible effort has been applied. A job application without a cover letter, even a few sentences, feels wrong. It's like sitting at a bar and someone tries to pick you up by showing you their car keys and class ring without talking to you. While the cover letter is nominally your chance to personalize your application, it should be sincere, even at the cost of brevity. Continuing the analogy, it shouldn't sound like a pickup line.

blah blah Ginger blah blah blah Ginger

One of the candidates tailored their resume for us, emboldening those skills which we requested in the job posting: Python blah blah blah, MySQL blah blah. I felt a tiny bit manipulated when I realized they had done this, but it made it so easy to see that they matched the minimum qualifications in …

read more

It's Madness!

Sat 02 January 2016 by Steven E. Pav

I recently released a package to CRAN called madness. The eponymous object supports 'multivariate' automatic differentiation by forward accumulation. By 'multivariate', I mean it allows you to track (and automatically computes) the derivative of a scalar, or vector, or matrix, or multidimensional array with respect to a scalar, vector, matrix or multidimensional array.

The primary use case in mind is the multivariate delta method, where one has an estimate of a population quantity and the variance-covariance of the same, and wants to perform inference on some transform of that population quantity. With the values stored in a madness object, one merely performs the transforms directly on the estimate, and the derivatives are computed automatically. A secondary use case would be for the automatic computation of gradients when optimizing some complex function, e.g. in the computation of the MLE of some quantity.

A madness object contains a value, val, as well as the derivative of val with respect to some \(X\), called dvdx. The derivative is stored as a matrix in 'numerator layout' convention: if val holds \(m\) values, and \(X\) holds \(n\) values, then dvdx is a \(m \times n\) matrix. This unfortunately means that a gradient is stored as a row vector. Numerator layout feels more natural (to me, at least) when propagating derivatives via the chain rule.

For convenience, one can also store the 'tags' of the value and \(X\), in vtag and xtag, respectively. The vtag will be modified when computations are performed, which can be useful for debugging. One can also store the variance-covariance matrix of \(X\) in varx.

Here is an example session showing the use of a madness object. Note that by default if one does not feed in dvdx, the object constructor assumes that the value is equal to \(X\), and so …

read more

Inference on Sorts

Wed 30 December 2015 by Steven E. Pav

Previously, I described a model for taste preference appropriate for some experiments in cocktail design I conducted years ago. I noted that this model was so elegant and simple, it must have been discovered previously, and have a rich theory around it. In the two weeks since then, I discovered a new paper on arxiv about inference on ranks from comparisons. They review a model much like the one I outlined, calling it the Bradley-Terry-Luce model. (Hey, look, there is indeed a package on CRAN for this with a vignette!)

The paper by Shah and Wainright outlines a very simple method for estimating the top \(k\) of \(n\) participants when the contests include exactly two participants each. If I am reading it correctly, you take the average number of observed wins for each contestant, then grab the top \(k\). They prove that this algorithm is optimal under certain conditions. This seems to me like an ideal outcome for a research result: the algorithm is dead simple, and people have likely been using it for years, while the proof is somewhat intricate. Unfortunately, it does not seem straightforward to generalize the algorithm to the case where there are covariates, or 'features' about the various contestants, nor necessarily to the case of multiple contestants in a given contest. The Bradley-Terry model, on the other hand, is readily adaptable to these modifications.

read more

Using vim as an IDE

Tue 29 December 2015 by Steven E. Pav

For a number of years now, I have been using vim as a lightweight IDE. The ecosystem of vim addons is rich. There are numerous plugins for creating tags to navigate a project, browse files in directories, highlight syntax and so on. What really makes it an IDE is the ability to execute code within the context of vim. I realize this probably sounds 'charming' to disciples of that other text editor, but it might seem like an unnatural urge to my vim correligionists. The piece that glues it all together is vim-conque. The easiest way to get conque in ubuntu is via apt as follows:

sudo apt-get install vim-addon-manager vim-conque
sudo vim-addons -w install conqueterm

The skinny on using conque is that you can visual-select code that you are editing, hit <F9>, and it will be transfered to the execution window, newlines and all. So you can test out code while you are writing it. You can also work the other way, testing out code in a REPL, then, when it is working as expected, escape insert mode in the REPL, yank the working code to a register, and copy it into the file you are working on.

Dockerfile or it didn't happen!

This kind of advice is a bit abstract, so I put a working example on github and dockerhub. You can run it yourself via docker:

# this might take a little while to download
docker pull shabbychef/vim-conque
docker run --rm -it shabbychef/vim-conque

This will feel a bit odd: when you run the last command, you are in vim, but you are in vim in a docker container. When you terminate, your changes will not be saved (this is the --rm flag). Directions are given in the file on how to start conque with a screen …

read more