The nerdosphere is in a minor tizzy over a putative bias in IMDb ratings for the new (2016) Ghostbusters film. It seems a bit odd to me, since IMDb ratings have always been horribly 'biased': If the question you are trying to answer is "If I am forced to watch this randomly selected movie, will I like it?", then IMDb ratings, and most aggregated movie ratings are difficult to interpret, very likely 'biased'. The typical mechanism by which a rating ends up on IMDb is that a person somehow gains an awareness of the film (this has been the major problem for studios since the end of the studio-theatre model seventy years ago), enough so to view the film; they are then more likely to rate the movie if they liked it, or liked it more than expected it, or really hated it. Those who had low to middling opinions of the film are less likely to rate it, and so you have the problem of missing data, without the simplifying assumption of "missing at random."

The Ghostbusters argy bargy (or one of them) is that reviews are suspected to be coming from people who have not seen the movie. This is possibly a problem for all reviews on IMDb, though less so for reviews appearing in streaming services, who know when you have seen a film. (The other argy bargy is that sexist and racist jerks have been harassing stars of the new film.) The analysis on five thirty eight is informative, but uses information (e.g. age and sex of the reviewers) that is not widely available, and which is volunteered by the reviewers. Given the IMDb mirror at my disposal, I can look for systematic biases for films based on sex, and will do so here.

I recently looked at IMDb ratings by actor age to see if the 'De Niro effect' was idiosyncratic, or whether reviewers systematically disliked films with older actors. The analysis there was a bit wonky, since I attempted to fit a fixed effect for every actor, but ratings are quoted for films not a particular actor's part in a film, but then multiple actors might participate in a given film, so they were randomly sampled to a single actor per film. Here I take a different approach, computing the (weighted) average age and sex of actors in a film. The weighting is based on the nr_order of actors within a film, the ordering given in IMDb which tells you roughly which actor or actress has top billing in a film. I exponentially down-weight based on this order, with the first listed actor/actress twice as important as the actor/actress in the fourth slot, who has twice the weight as number 7 and so on.

As before, remove films tagged with Documentary genre, keep those which have a production year between 1965 and 2015, and have listed English as a language. (You will find that Indian and Turkish movies have many fans on IMDb, often with uncomparable ratings.) You should be able to follow along with this analysis at home if you have the mirror. If you want to skip the data gathering part, you can get my cut of the data.

library(RMySQL)
library(dplyr)
library(knitr)
# get the connection and set to UTF-8 (probably not necessary here)
dbcon <- src_mysql(host='0.0.0.0',user='moe',password='movies4me',dbname='IMDB',port=23306)
capt <- dbGetQuery(dbcon$con,'SET NAMES utf8')
# genre information
movie_genres <- tbl(dbcon,'movie_info') %>%
    inner_join(tbl(dbcon,'info_type') %>% 
        filter(info %regexp% 'genres') %>%
        select(info_type_id),
        by='info_type_id') 
# get documentary movies;
doccos <- movie_genres %>% 
        filter(info %regexp% 'Documentary') %>%
        select(movie_id)
# language information
movie_languages <- tbl(dbcon,'movie_info') %>%
    inner_join(tbl(dbcon,'info_type') %>% 
        filter(info %regexp% 'languages') %>%
        select(info_type_id),
        by='info_type_id') 
# get movies with English
unnerstandit <- movie_languages %>% 
        filter(info %regexp% 'English') %>%
        select(movie_id)
# movies which are not documentaries, have some English, filtered by production year
movies <- tbl(dbcon,'title') %>%
    select(-imdb_index,-ttid,-md5sum) %>%
    anti_join(doccos %>% distinct(movie_id),by='movie_id') %>%
    inner_join(unnerstandit %>% distinct(movie_id),by='movie_id') %>%
    filter(production_year >= 1965,production_year <= 2015)
# votes for all movies, filtered by having enough votes
vote_info <- tbl(dbcon,'movie_votes') %>% 
    select(movie_id,votes,vote_mean,vote_sd,vote_se) %>%
    filter(votes >= 25)
# join the two together
# nb. dplyr is having problems with collect, so collect early...
mvotes <- inner_join(movies,vote_info,by='movie_id') %>%
    collect(n=Inf)

# change this to change downweighting.
# 3 = person #1 is twice as important as person #4
# 10 = person #1 is twice as important as person #11
ORDER_DOWNWEIGHTING <- 3
# acts in relation
# inner join with subselected movies
# nb. dplyr is having problems with collect, so collect early...
acts_in <- tbl(dbcon,'cast_info') %>%
    inner_join(tbl(dbcon,'role_type') %>% 
        filter(role %regexp% 'actor|actress'),
        by='role_id') %>%
    select(person_id,movie_id,nr_order) %>%
    filter(!is.na(nr_order)) %>%
    mutate(weight=2^(-nr_order/ORDER_DOWNWEIGHTING)) %>%
    inner_join(movies %>% select(movie_id),by='movie_id') %>%
    collect(n=Inf)
# get actors with many films
good_actors <- tbl(dbcon,'name') %>%
    select(person_id,name,gender,dob) %>%
    filter(!is.na(dob)) %>%
    mutate((gender=='m') || (gender=='f')) %>%
  mutate(yob=year(dob)) %>%
    mutate(ismale=(gender=='m')) %>%
    filter(yob >= 1875) %>%
    collect(n=Inf)
# join the good actors with acts-in 
# with mvotes.
bigdata <- good_actors %>%
    inner_join(acts_in %>% inner_join(mvotes,by='movie_id'),by='person_id') %>%
    mutate(actor_age=production_year - yob) %>%
    filter(actor_age >= 5,actor_age <= 100)
# get the mean age and sex
mean_stuff <- bigdata %>%
    group_by(movie_id) %>%
    summarize(sum_wgt = sum(weight),
        sum_age = sum(weight*actor_age),
        sum_ism = sum(weight*ismale)) %>%
    ungroup() %>%
    mutate(mean_age = sum_age / sum_wgt,
   mean_ism = sum_ism / sum_wgt)
# join together with votes
joined <- mean_stuff %>% 
    inner_join(mvotes,by='movie_id')
# write it so you all can have it.
#library(readr)
#readr::write_csv(joined,path='../data/movie_rate_by_sex.csv')

We have a weighted average age of actors, and a weighted average sex, where a 0 means "all female cast" and 1 "all male cast". Here are some top films based on mean rating, with mean age and sex of the cast. (ism stands for "is male".)

joined %>%
    filter(votes > 5e4) %>%
    arrange(desc(vote_mean)) %>%
    select(movie_id,title,mean_ism,mean_age,production_year,votes,vote_mean,vote_sd) %>%
    head(10) %>%
    kable()
movie_id title mean_ism mean_age production_year votes vote_mean vote_sd
756550 The Godfather 0.952 42.4 1972 1131245 7.96 2.70
799705 The Shawshank Redemption 1.000 44.4 1994 1652593 7.96 2.70
5868 3 Idiots 0.850 38.7 2009 199139 7.79 2.74
706413 Swades: We, the People 0.624 36.9 2004 56090 7.79 2.74
741988 The Dark Knight 0.869 42.1 2008 1638089 7.79 2.74
756561 The Godfather: Part II 0.791 38.3 1974 772323 7.79 2.74
773650 The Lord of the Rings: The Return of the King 0.665 30.2 2003 1189275 7.79 2.74
266250 Forrest Gump 0.746 42.1 1994 1218328 7.73 2.58
420754 La vita bella 0.675 42.5 1997 408755 7.73 2.58
777754 The Matrix 0.739 38.0 1999 1190603 7.73 2.58

First, plots of the IMDb rating of a film versus average cast maleness, and then versus average cast age. The plots are further colored by age and sex. Then a boxplot grouped by age and sex classe.

require(ggplot2)
plot_dat <- joined %>% 
    filter(votes >= 100) %>%
    mutate(cast_maleness = 200 * (mean_ism - 0.5)) %>%
    mutate(ageish = cut(mean_age, breaks=c(0,20,30,40,65,100), 
        labels=c("teenage","twentysomething","thirtysomething","middle age","senior citizen"),right=FALSE)) %>%
    mutate(maleish = cut(cast_maleness, breaks=c(-101,-33,33,100),
        labels=c("mostly female","balanced","mostly male"),right=TRUE))

ph <- ggplot(plot_dat,aes(x=cast_maleness,y=vote_mean,color=ageish)) + 
    geom_jitter() + 
    geom_smooth() +
    labs(x="cast maleness: -100=all female; 100=all male;",
        y="IMDb rating")
ph

plot of chunk mbsx_plotz1

ph <- ggplot(plot_dat,aes(x=mean_age,y=vote_mean,colour=maleish)) + 
    geom_jitter() + 
    geom_smooth() +
    labs(x="cast average age",
        y="IMDb rating")
ph

plot of chunk mbsx_plotz1

ph <- ggplot(plot_dat,aes(x=ageish,y=vote_mean)) +
    geom_boxplot(aes(fill=maleish),varwidth=FALSE) +
    geom_jitter(alpha=0.05,aes(color=maleish)) + 
    labs(x="cast average age",
        y="IMDb rating")
ph

plot of chunk mbsx_plotz1

I do not see a huge effect here. The slight apparent increase in ratings for films with very young or very old cast members is likely caused by the small sample sizes. A regression might tell us something about the effect sizes:

mod0 <- lm(vote_mean ~ maleish * ageish,plot_dat)
print(summary(mod0))
## 
## Call:
## lm(formula = vote_mean ~ maleish * ageish, data = plot_dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.119 -0.439  0.152  0.650  3.281 
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                6.2248     0.1186   52.49  < 2e-16
## maleishbalanced                            0.0814     0.2219    0.37    0.714
## maleishmostly male                         0.0706     0.1504    0.47    0.639
## ageishtwentysomething                     -0.5861     0.1240   -4.73  2.3e-06
## ageishthirtysomething                     -0.5247     0.1224   -4.29  1.8e-05
## ageishmiddle age                          -0.1957     0.1253   -1.56    0.118
## ageishsenior citizen                       0.0013     0.2161    0.01    0.995
## maleishbalanced:ageishtwentysomething      0.2088     0.2265    0.92    0.357
## maleishmostly male:ageishtwentysomething   0.2893     0.1567    1.85    0.065
## maleishbalanced:ageishthirtysomething      0.1075     0.2244    0.48    0.632
## maleishmostly male:ageishthirtysomething   0.1272     0.1539    0.83    0.409
## maleishbalanced:ageishmiddle age          -0.1829     0.2260   -0.81    0.419
## maleishmostly male:ageishmiddle age       -0.2402     0.1561   -1.54    0.124
## maleishbalanced:ageishsenior citizen       0.1511     0.3402    0.44    0.657
## maleishmostly male:ageishsenior citizen   -0.1638     0.2487   -0.66    0.510
##                                             
## (Intercept)                              ***
## maleishbalanced                             
## maleishmostly male                          
## ageishtwentysomething                    ***
## ageishthirtysomething                    ***
## ageishmiddle age                            
## ageishsenior citizen                        
## maleishbalanced:ageishtwentysomething       
## maleishmostly male:ageishtwentysomething .  
## maleishbalanced:ageishthirtysomething       
## maleishmostly male:ageishthirtysomething    
## maleishbalanced:ageishmiddle age            
## maleishmostly male:ageishmiddle age         
## maleishbalanced:ageishsenior citizen        
## maleishmostly male:ageishsenior citizen     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.956 on 27802 degrees of freedom
## Multiple R-squared:  0.00674,    Adjusted R-squared:  0.00624 
## F-statistic: 13.5 on 14 and 27802 DF,  p-value: <2e-16

Note that the Intercept term here refers to the lowest class levels: a mostly female, teenage cast. (And yes, I have removed porn films from the mirror.) We see a 'significant' decrease in rating for age classes twenty and thirty somethings, but no significant effects otherwise. The effect sizes for age are on the order of half a rating point, somewhat larger than the average effects seen previously, but not terribly larger, while the effect sizes for sex are small, less than a tenth of a rating point.

In all, we do not see here a significant 'sexist bias' in film ratings, where films with mostly female cast are consistently rated lower. This does not mean that some films are subject to sexist campaigns, nor does it mean that reviewer sex is independent of review. It merely suggests that among the many biases in IMDb reviews, rampant sexism is not a leading cause of error.