You Deserve Expensive Champagne ... If You Buy It.

Sat 26 December 2015 by Steven E. Pav

I received some taster ratings from the champagne party we attended last week. I joined the raw ratings with the bottle information to create a single aggregated dataset. This is a 'non-normal' form, but simplest to distribute. Here is a taste:

champ <- read.csv('../data/champagne_ratings.csv',stringsAsFactors=FALSE)
champ %>% select(winery,purchase_price_per_liter,raternum,rating) %>% 
    head(8) %>% kable(format='markdown')
winery purchase_price_per_liter raternum rating
Barons de Rothschild 80.00000 1 10
Onward Petillant Naturel 2014 Malavasia Bianca 33.33333 1 4
Chandon Rose Method Traditionnelle 18.66667 1 8
Martini Prosecco from Italy 21.32000 1 8
Roederer Estate Brut 33.33333 1 8
Kirkland Asolo Prosecco Superiore 9.32000 1 7
Champagne Tattinger Brute La Francaise 46.66667 1 6
Schramsberg Reserver 2001 132.00000 1 6

Recall that the rules of the contest dictate that the average rating of each bottle was computed, then divided by 25 dollars more than the price (presumably for a 750ml bottle). Depending on whether the average ratings were compressed around the high end of the zero to ten scale, or around the low end, one would wager on either the cheapest bottles, or more moderately priced offerings. (Based on my previous analysis, I brought the Menage a Trois Prosecco, rated at 91 points, but available at Safeway for 10 dollars.) It is easy to compute the raw averages using dplyr:

avrat <- champ %>% 
    group_by(winery,bottle_num,purchase_price_per_liter) %>%
    summarize(avg_rating=mean(rating)) %>%
    ungroup() %>%
avrat %>% head(8) %>% kable(format='markdown')
winery bottle_num purchase_price_per_liter avg_rating
Desuderi Jeio 4 22.66667 6.750000
Gloria Ferrer Sonoma Brut 19 20.00000 6.750000
Roederer Estate Brut 12 34.66667 6.642857
Charles Collin Rose 34 33.33333 6.636364
Roederer Estate Brut 13 33.33333 6.500000
Gloria Ferrer Sonoma Brut 11 21 …
read more

No Accounting for Taste

Sat 19 December 2015 by Steven E. Pav

Many years ago, before I had kids, I was afflicted with a mania for Italian bitters. A particularly ugly chapter of this time included participating (at least in my own mind) in a contest to design a cocktail containing [obnoxious brand] Amaro. I was determined to win this contest, and win it with science. After weeks of scattershot development (with permanent damage to our livers), the field of potential candidates was winnowed down to around 12 or so. I then planned a 'party' with a few dozen friend-tasters to determine the final entrant into the contest.

As I had no experience with market research or experimental design, I was nervous about making rookie mistakes. I was careful, or so I thought, about the experimental design--assigning raters to cocktails in a balanced design, assigning random one-time codes to the cocktails, adding control cocktails, double blinding the tastings, and so on. The part that I was completely fanatical about was that tasters should not assign numerical ratings to the cocktails. I reasoned that the intra-rater and inter-rater reliability was far too poor. Instead, each rater would be presented with two cocktails, and state their preference. While in some situations, with experienced raters using the same rating scale, this might result in a loss of power, in my situation, with a gaggle of half-drunk friends, it solved the problem of inconsistent application of numerical ratings. The remaining issue was how to interpret the data to select a winner.

Tell me what you really think.

You can consider my cocktail party as a series of 'elections', each with two or more 'candidates' (in my case exactly two every time), and a single winner for each election. For each candidate in each election, you have some 'covariates', or 'features' as the ML people would call …

read more

Champagne Party

Thu 17 December 2015 by Steven E. Pav

We have been invited to a champagne tasting party and competition. The rules of the contest are as follows: partygoers bring a bottle of champagne to share. They taste, then rate the different champagnes on offer, with ratings on a scale of 1 through 10. The average rating is computed for each bottle, then divided by the price (plus some offset) to arrive at an adjusted quality score. The champagne with the highest score nets a prize, and considerable bragging rights, for its owner. Presumably the offset is introduced to prevent small denominators from dominating the rating, and is advertised to have a value of around $25. The 'price' is, one infers, for a standard 750 ml bottle.

I decided to do my homework for a change, rather than SWAG it. I have been doing a lot of web scraping lately, so it was pretty simple to gather some data on champagnes from wine dot com. This file includes the advertised and sale prices, as well as advertised ratings from Wine Spectator (WS), Wine Enthusiast (WE), and so on. Some of the bottles are odd sizes, so I compute the cost per liter as well. (By the way, many people would consider the data collection the hard part of the problem. rvest made it pretty easy, though.) Here's a taste:

champ <- read.csv('../data/champagne.csv')
champ %>% arrange(price_per_liter) %>% head(10) %>% kable(format='markdown')
name price sale_price WS WE WandS WW TP JS ST liters price_per_liter
Pol Clement Rose Sec 8.99 NA NA NA NA NA NA NA NA 0.75 12.0
Freixenet Carta Nevada Brut 8.99 NA NA NA NA NA NA NA NA 0.75 12.0
Wolf Blass Yellow Label Brut 8.99 NA NA NA NA NA NA NA …
read more

A supposedly simple exercise I hope to never repeat

Wed 16 December 2015 by Steven E. Pav

Years ago, when I was writing the code to support my thesis, our research group was using the functional programming language SML-NJ. As the saying goes, it's pretty indy, you might not have heard of it. You can view SML-NJ as a early ancestor of Haskell, but without the rich ecosystem of Monadic tutorials and proselytizers. Our CS colleagues were very enthusiastic about the language, and rightly so: compared to, say, Java, functional languages offered (and continue to offer) a tantalizing reward of automagic parallelization. As a non-negligible bonus, SML-NJ was (and probably still is) a completely green field. There were no public available libraries as far as I knew, meaning the CS guys could start at year zero and code with purity. They began with monoids and worked their way up to vector spaces, matrices, and so on. These libraries were very elegant. Because of the close binding of math and code, they were 'obviously' correct by inspection.

My meshing code needed the user (just me, really) to enter line segments and facets which should be respected by the mesh. It became apparent that asking the user (again, just me) to enter the equations defining these was too onerous. The code should just compute the equations when given the locations of points known to be in these features. The best way to do this, I reasoned, was via a singular value decomposition: find the dimensions which explain the most variation in the coordinates of the points.

Without any extant packages in SML/NJ, I set out to write the SVD code myself. I spent two days holed up in my office with a copy of Golub & Van Loan's book. This is 'the' book for guiding you through this process, or so I reasoned, and I was a mildly competent …

read more