Gilgamath

Best Picture?

Sun 22 January 2017 by Steven E. Pav

For a brief time I found myself working in the field of film analytics. One of our mad scientist type projects at the time was trying to predict which films might win an award. As a training exercise, we decided to analyze the Oscars.

With such a great beginning, you might be surprised to find the story does not end well. Collecting the data for such an analysis was a minor endeavor. At the time we had scraped and cobbled together a number of different databases about films, but connecting them to each other was a huge frustration. Around the time we would have been predicting the Oscars, the floor fell out from our funding and we were unemployed three weeks after they announced the Oscar 2015 winners.

Our loss is your gain, as I am now releasing the first cut of the data frame I was using. The data are available in a CSV file here. The columns are as follows:

year is the year of the Oscars.
category should always be Best Picture here.
film is the title.
etc is extra information to identify the film.
winner is a Boolean for whether the film won in that category.
id and movie_id are internal IDs, and have no use for you.
ttid is the best guess for the IMDb 'tt ID'.
title and production_year are from the IMDb data.
votes are the total number of votes in IMDb for the IMDb film. (This is an old cut of the data.)
vote_mean, vote_sd are the mean and standard deviation of user votes for the film in IMDb.
vote1 and vote10 are the proportion of 1- and 10-star votes for the film in IMDb.
I do not remember what series is.
total_gross is one estimate of gross receipts, and bom is …