No Accounting for Taste
Sat 19 December 2015
by Steven E. Pav
Many years ago, before I had kids, I was afflicted with a mania for
Italian bitters. A particularly ugly chapter of this time included
participating (at least in my own mind) in a contest to design a
cocktail containing [obnoxious brand] Amaro. I was determined to win
this contest, and win it with science.
After weeks of scattershot development (with permanent damage to our livers),
the field of potential candidates was winnowed down to around 12 or so.
I then planned a 'party' with a few dozen friend-tasters to determine
the final entrant into the contest.
As I had no experience with market research or experimental design, I was
nervous about making rookie mistakes.
I was careful, or so I thought, about the
experimental design--assigning raters to cocktails in a balanced design,
assigning random one-time codes to the cocktails, adding control cocktails,
double blinding the tastings, and so on. The part that I was completely
fanatical about was that tasters should not assign numerical
ratings to the cocktails. I reasoned that the intra-rater and inter-rater
reliability was far too poor. Instead, each rater would be presented with
two cocktails, and state their preference. While in some situations,
with experienced raters using the same rating scale, this might result in a
loss of power, in my situation, with a gaggle of half-drunk friends,
it solved the problem of inconsistent application of numerical ratings.
The remaining issue was how to interpret the data to select a winner.
Tell me what you really think.
You can consider my cocktail party as a series of 'elections', each with two
or more 'candidates' (in my case exactly two every time), and a single winner
for each election. For each candidate in each election, you have some
'covariates', or 'features' as the ML people would call …
Thu 17 December 2015
by Steven E. Pav
We have been invited to a champagne tasting party and competition.
The rules of the contest are as follows: partygoers bring a bottle
of champagne to share. They taste, then rate the different
champagnes on offer, with ratings on a scale of 1 through 10.
The average rating is computed for each bottle, then
divided by the price (plus some offset) to arrive at an
adjusted quality score. The champagne with the highest score
nets a prize, and considerable bragging rights, for its owner.
Presumably the offset is introduced to prevent small denominators
from dominating the rating, and is advertised to have a value of
around $25. The 'price' is, one infers, for a standard 750 ml bottle.
I decided to do my homework for a change, rather than SWAG it.
I have been doing a lot of web scraping lately, so it was pretty
simple to gather some data on champagnes
from wine dot com. This file includes the advertised and sale prices,
as well as advertised ratings from Wine Spectator (WS), Wine Enthusiast
(WE), and so on. Some of the bottles are odd sizes, so I compute the
cost per liter as well. (By the way, many people would consider the data
collection the hard part of the problem.
rvest made it pretty easy, though.)
Here's a taste:
champ <- read.csv('../data/champagne.csv')
champ %>% arrange(price_per_liter) %>% head(10) %>% kable(format='markdown')
|Pol Clement Rose Sec
|Freixenet Carta Nevada Brut
|Wolf Blass Yellow Label Brut
A supposedly simple exercise I hope to never repeat
Wed 16 December 2015
by Steven E. Pav
Years ago, when I was writing the code to support my thesis, our research
group was using the functional programming language SML-NJ. As the saying
goes, it's pretty indy, you might not have heard of it.
You can view SML-NJ as a early ancestor of Haskell, but without the rich ecosystem of
Monadic tutorials and proselytizers. Our CS colleagues were very enthusiastic
about the language, and rightly so: compared to, say, Java, functional
languages offered (and continue to offer) a tantalizing reward of
automagic parallelization. As a non-negligible bonus, SML-NJ was (and probably
still is) a completely green field. There were no public available libraries
as far as I knew, meaning the CS guys could start at year zero and code
with purity. They began with monoids and worked their way up to vector
spaces, matrices, and so on. These libraries were very elegant. Because
of the close binding of math and code, they were 'obviously' correct by
My meshing code needed the user (just me, really) to enter line segments
and facets which should be respected by the mesh. It became apparent
that asking the user (again, just me) to enter the equations defining these
was too onerous. The code should just compute the equations when given
the locations of points known to be in these features. The best way to do this,
I reasoned, was via a singular value decomposition: find the dimensions
which explain the most variation in the coordinates of the points.
Without any extant packages in SML/NJ, I set out to write the SVD code myself.
I spent two days holed up in my office with a copy of
Golub & Van Loan's book.
This is 'the' book for guiding you through this process, or so I reasoned,
and I was a mildly competent …
Proof of Useful Work
Sat 12 December 2015
by Steven E. Pav
I recently caught the flu double header. As appropriate for someone
in my condition, I spent a good many hours riding around on city buses,
mumbling to myself and reading about bitcoin on my phone.
If you are looking for a decent semi-technical introduction, Michael Nielsen's
The part of bitcoin that strikes me as bizarre is what the proof-of-work
exercise entails. Essentially, to sustain an agreed-upon but decentralized
public record of transactions, participants are madly trying to solve a
useless reverse-hashing puzzle.
Basically, "guess some bits such that when you append them to
this fixed long string of bits, the hash starts with at least 5 (or whatever)
zeroes." By making the puzzle hard to solve and easy to verify, and rewarding
those who solve it, the system has accountability and resilience, and is
robust against takeover.
However, it is hard not to see the hashing puzzle as a satire of
contemporary work culture: participants are paid to use their computers to
solve numeric puzzles which are of no interest to anyone. (Never mind
the potential environmental impact if cryptocurrencies see greater adoption.)
You know who else liked Ansatz?
However, the hashing puzzle reminded me of something, in my feverish state.
In the first weeks of differential equations classes, it is customary
to pose a differential equation, then present the solution, deus ex machina,
and confirm it is the solution. Hard to solve, easy to verify.
Partial differential equations have the same nature. For example,
the solution to the
heat equation involves
some drudgery, but confirmation of the solution is pretty simple. In fact,
computers can even verify solutions of differential equations because
symbolic differentiation is relatively simple.
So what if we could make an altcoin where the proof of work involved the
solution to a real-world …