It's Madness!
Sat 02 January 2016
by
Steven E. Pav
I recently released a package to CRAN called
madness. The eponymous object
supports 'multivariate' automatic
differentiation by forward accumulation. By 'multivariate', I mean it allows
you to track (and automatically computes) the derivative of a scalar, or
vector, or matrix, or multidimensional array with respect to a scalar, vector,
matrix or multidimensional array.
The primary use case in mind is the
multivariate delta method,
where one has an estimate of a population quantity and the variance-covariance
of the same, and wants to perform inference on some transform of that
population quantity. With the values stored in a madness
object, one merely
performs the transforms directly on the estimate, and the derivatives are
computed automatically. A secondary use case would be for the automatic
computation of gradients when optimizing some complex function, e.g. in the
computation of the MLE of some quantity.
A madness
object contains a value, val
, as well as the derivative of
val
with respect to some \(X\), called dvdx
. The derivative is stored
as a matrix in 'numerator layout' convention: if val
holds
\(m\) values, and \(X\) holds \(n\) values, then dvdx
is a \(m \times n\) matrix.
This unfortunately means that a gradient is stored as a row vector.
Numerator layout feels more natural (to me, at least) when propagating
derivatives via the chain rule.
For convenience, one can also store the 'tags' of the value and \(X\), in
vtag
and xtag
, respectively. The vtag
will be modified when computations
are performed, which can be useful for debugging. One can also store
the variance-covariance matrix of \(X\) in varx
.
Here is an example session showing the use of a madness
object. Note that by
default if one does not feed in dvdx
, the object constructor assumes that
the value is equal to \(X\), and so …
read more
Inference on Sorts
Wed 30 December 2015
by
Steven E. Pav
Previously, I described a
model for taste preference appropriate
for some experiments in cocktail design I conducted years ago.
I noted that this model was so elegant and simple, it must have been discovered
previously, and have a rich theory around it. In the two weeks since then,
I discovered a new paper on arxiv about
inference on ranks from comparisons. They review a model much like the one I
outlined, calling it the
Bradley-Terry-Luce
model. (Hey, look, there is indeed a
package on CRAN
for this with a vignette!)
The paper by Shah and Wainright outlines a very simple method for estimating
the top \(k\) of \(n\) participants when the contests include exactly two
participants each. If I am reading it correctly, you take the average number
of observed wins for each contestant, then grab the top \(k\). They prove that
this algorithm is optimal under certain conditions. This seems to me
like an ideal outcome for a research result: the algorithm is dead simple,
and people have likely been using it for years, while the proof is somewhat
intricate. Unfortunately, it does not seem straightforward to generalize
the algorithm to the case where there are covariates, or 'features' about
the various contestants, nor necessarily to the case of multiple contestants
in a given contest. The Bradley-Terry model, on the other hand, is readily
adaptable to these modifications.
read more
Using vim as an IDE
Tue 29 December 2015
by
Steven E. Pav
For a number of years now, I have been using vim as a lightweight IDE. The
ecosystem of vim addons is rich. There are numerous plugins for creating tags
to navigate a project, browse files in directories, highlight syntax and so on.
What really makes it an IDE is the ability to execute code within the context
of vim.
I realize this probably sounds 'charming' to disciples of that other
text editor, but it might seem like an unnatural urge to my vim
correligionists. The piece that glues it all together is vim-conque
. The
easiest way to get conque in ubuntu is via apt
as follows:
sudo apt-get install vim-addon-manager vim-conque
sudo vim-addons -w install conqueterm
The skinny on using conque is that you can visual-select code that you are
editing, hit <F9>
, and it will be transfered to the execution window,
newlines and all. So you can test out code while you are writing it. You
can also work the other way, testing out code in a REPL, then, when it is
working as expected, escape insert mode in the REPL, yank the working code to a
register, and copy it into the file you are working on.
Dockerfile or it didn't happen!
This kind of advice is a bit abstract, so I put a working example on
github and
dockerhub. You can run it
yourself via docker:
# this might take a little while to download
docker pull shabbychef/vim-conque
docker run --rm -it shabbychef/vim-conque
This will feel a bit odd: when you run the last command, you are in vim, but
you are in vim in a docker container. When you terminate, your changes will
not be saved (this is the --rm
flag). Directions are given in the file
on how to start conque with a screen …
read more
You Deserve Expensive Champagne ... If You Buy It.
Sat 26 December 2015
by
Steven E. Pav
I received some taster ratings from the champagne party we
attended last week.
I joined the raw ratings with the bottle information to
create a single aggregated dataset.
This is a 'non-normal' form, but simplest to distribute. Here is a taste:
library(dplyr)
library(readr)
library(knitr)
champ <- read_csv('../data/champagne_ratings.csv')
champ %>% select(winery,purchase_price_per_liter,raternum,rating) %>%
head(8) %>% kable(format='markdown')
winery |
purchase_price_per_liter |
raternum |
rating |
Barons de Rothschild |
80.00000 |
1 |
10 |
Onward Petillant Naturel 2014 Malavasia Bianca |
33.33333 |
1 |
4 |
Chandon Rose Method Traditionnelle |
18.66667 |
1 |
8 |
Martini Prosecco from Italy |
21.32000 |
1 |
8 |
Roederer Estate Brut |
33.33333 |
1 |
8 |
Kirkland Asolo Prosecco Superiore |
9.32000 |
1 |
7 |
Champagne Tattinger Brute La Francaise |
46.66667 |
1 |
6 |
Schramsberg Reserver 2001 |
132.00000 |
1 |
6 |
Recall that the rules of the contest dictate that the average rating of each
bottle was computed, then divided by 25 dollars more than the
price (presumably for a 750ml bottle). Depending on whether the average
ratings were compressed around the high end of the zero to ten scale,
or around the low end, one would wager on either the cheapest bottles, or more
moderately priced offerings. (Based on my
previous analysis, I brought the
Menage a Trois Prosecco, rated at 91 points, but available at Safeway for
10 dollars.) It is easy to compute the raw averages using dplyr
:
avrat <- champ %>%
group_by(winery,bottle_num,purchase_price_per_liter) %>%
summarize(avg_rating=mean(rating)) %>%
ungroup() %>%
arrange(desc(avg_rating))
avrat %>% head(8) %>% kable(format='markdown')
winery |
bottle_num |
purchase_price_per_liter |
avg_rating |
Desuderi Jeio |
4 |
22.66667 |
6.750000 |
Gloria Ferrer Sonoma Brut |
19 |
20.00000 |
6.750000 |
Roederer Estate Brut |
12 |
34.66667 |
6.642857 |
Charles Collin Rose |
34 |
33.33333 |
6.636364 |
Roederer Estate Brut |
13 |
33.33333 |
6.500000 |
Gloria Ferrer Sonoma Brut |
11 … |
read more