It is time to get kiddo a new Lego set, as he's been on a bender this week, building everything he can get his hands on. I wanted to optimize play time per dollar spent, so I set out to look for Lego pricing data.
Not surprisingly, there are a number of good sources for this
data. The best I found was at brickset.
Sign up for an account, then go to their
query builder. I built a query
requesting all sets from 2011 onwards, then selected the CSV
option,
copied the data into my clipboard, then dumped it via
xclip -o > brickset_db.csv
.
The brickset data is updated over time, so there's no reason to
prefer my file to one you download yourself.
First I load the data in R, filter based on availability of Piece and Price data, then remove certain themes (Books, Duplo, and so on). I then subselect themes based on having a large range of prices and of number of pieces:
library(readr)
library(dplyr)
indat <- readr::read_csv('../data/brickset_db.csv') %>%
select(Number,Theme,Subtheme,Year,Name,Pieces,USPrice)
## Rows: 4843 Columns: 18
## -- Column specification ----------------------------------------------------------------------------------------------------------------------------------
## Delimiter: ","
## chr (7): Number, Theme, Subtheme, Name, ImageURL, Owned, Wanted
## dbl (10): SetID, Variant, Year, Minifigs, Pieces, UKPrice, USPrice, CAPrice,...
## lgl (1): Rating
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
subdat <- indat %>%
filter(!is.na(Pieces),Pieces >= 10,
!is.na(USPrice),USPrice > 1,
!grepl('^(Books|Mindstorms|Duplo|.+Minifigures|Power Func|Games|Education|Serious)',Theme))
subok <- subdat %>%
group_by(Theme) %>%
summarize(many_sets=(sum(!is.nan(USPrice)) >= 10),
piece_spread=((max(Pieces) / min(Pieces)) >= 5),
price_spread=((max(USPrice) / min(USPrice)) >= 4)) %>%
ungroup() %>%
filter(many_sets & piece_spread & price_spread) %>%
select(-many_sets,-piece_spread,-price_spread)
subdat <- subdat %>%
inner_join(subok,by='Theme')
set.seed(1234)
subdat %>% sample_n(10) %>%
kable(padding=10,digits=4,
caption='Random sampling of some Lego sets',
format='html')
Number | Theme | Subtheme | Year | Name | Pieces | USPrice |
---|---|---|---|---|---|---|
10722 | Juniors | Ninjago | 2016 | Snake Showdown | 92 | 14.99 |
41086 | Friends | Vet | 2015 | Vet Ambulance | 89 | 9.99 |
70819 | The LEGO Movie | NA | 2015 | Bad Cop Car Chase | 292 | 29.99 |
75073 | Star Wars | MicroFighters | 2015 | Vulture Droid | 77 | 9.99 |
70704 | Space | Galaxy Squad | 2013 | Vermin Vaporizer | 506 | 59.99 |
75052 | Star Wars | Episode IV | 2014 | Mos Eisley Cantina | 615 | 69.99 |
75020 | Star Wars | Episode VI | 2013 | Jabba's Sail Barge | 850 | 119.99 |
75058 | Star Wars | Episode I | 2014 | MTT | 954 | 89.99 |
9467 | Monster Fighters | NA | 2012 | The Ghost Train | 741 | 79.99 |
60046 | City | Police | 2014 | Helicopter Surveillance | 528 | 74.99 |
Now I plot the results for a few different themes, in log scale, price versus pieces. The power law fits should have slope around 1, while the intercept is like the natural log of price per piece. I expected this to be about a dime, corresponding to a value of -2.303. Instead, we see larger values, with a higher premium for Star Wars themes (I knew it), than the old Creator line.
library(ggplot2)
library(ggpmisc)
fmla <- y~x
ph <- subdat %>%
filter(Pieces > 50) %>%
mutate(Year=factor(Year)) %>%
mutate(Theme=forcats::fct_reorder(Theme,USPrice)) %>%
filter(grepl('Star Wars|Minecraft|Disney|Creator',as.character(Theme))) %>%
ggplot(aes(Pieces,USPrice,color=Theme)) + geom_point() +
scale_x_log10() + scale_y_log10() +
facet_wrap(~Theme) + geom_smooth(method='lm') +
ggpmisc::stat_poly_eq(formula=fmla, parse=TRUE, aes(label=..eq.label..)) +
labs(x='Number of pieces',y='Price ($ US)',title='LEGO sets 2015-2017\ndata courtesy of brickset.com')
ph
However, usually I think of a theme premium as an increase in the intercept in a linear model. Let's use linear regression to look at a model for this. I fit the cost per piece slope as a function of release year, then fit an intercept term for each Theme in my set, then order by the estimated intercept terms. We see that the price per piece is indeed around a dime, and has held steady over time. There is a fixed premium for the branded themes (Star Wars, the Hobbit, and so on), but, somewhat to my surprise, the City themed sets also seem to be more expensive. Note the apparent discount for Lego Movie tie-in sets. Luckily for me, the kiddo likes the Creator 3-in-1 sets.
mod_linear <- lm(USPrice ~ Pieces:Year + Theme - 1,data=subdat %>% mutate(Year=factor(Year)))
mod_linear %>%
summary() %>%
coefficients() %>%
as.data.frame() %>%
add_rownames() %>%
mutate(is_slope = !grepl('Pieces',rowname)) %>%
arrange(is_slope,Estimate) %>%
select(-is_slope) %>%
kable(row.names=FALSE,
padding=10,digits=3,
caption='Linear Regression on Lego Set prices',
format='html')
rowname | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|
Pieces:Year2017 | 0.090 | 0.002 | 50.925 | 0.000 |
Pieces:Year2016 | 0.094 | 0.001 | 82.774 | 0.000 |
Pieces:Year2012 | 0.098 | 0.002 | 52.276 | 0.000 |
Pieces:Year2014 | 0.098 | 0.002 | 63.621 | 0.000 |
Pieces:Year2015 | 0.099 | 0.001 | 69.229 | 0.000 |
Pieces:Year2013 | 0.103 | 0.002 | 63.469 | 0.000 |
Pieces:Year2011 | 0.106 | 0.002 | 55.747 | 0.000 |
ThemeAdvanced Models | -33.749 | 2.814 | -11.991 | 0.000 |
ThemeClassic | -18.404 | 3.531 | -5.212 | 0.000 |
ThemeBricks and More | -8.464 | 2.985 | -2.835 | 0.005 |
ThemeThe LEGO Movie | -3.892 | 2.858 | -1.362 | 0.173 |
ThemeCreator | -1.936 | 1.290 | -1.501 | 0.134 |
ThemeSpeed Champions | -1.867 | 3.385 | -0.551 | 0.581 |
ThemeElves | 0.024 | 2.774 | 0.009 | 0.993 |
ThemeMonster Fighters | 0.200 | 3.909 | 0.051 | 0.959 |
ThemeMiscellaneous | 0.608 | 2.693 | 0.226 | 0.821 |
ThemeArchitecture | 0.842 | 2.349 | 0.359 | 0.720 |
ThemeThe Lord of the Rings | 1.037 | 3.323 | 0.312 | 0.755 |
ThemeCastle | 1.112 | 3.666 | 0.303 | 0.762 |
ThemeUltra Agents | 1.911 | 3.406 | 0.561 | 0.575 |
ThemeFriends | 1.952 | 1.015 | 1.922 | 0.055 |
ThemeTeenage Mutant Ninja Turtles | 2.189 | 3.181 | 0.688 | 0.491 |
ThemeCars | 2.495 | 2.715 | 0.919 | 0.358 |
ThemeNexo Knights | 2.759 | 1.774 | 1.555 | 0.120 |
ThemeSpace | 3.198 | 2.851 | 1.122 | 0.262 |
ThemePirates of the Caribbean | 3.495 | 3.892 | 0.898 | 0.369 |
ThemeLegends of Chima | 3.686 | 1.298 | 2.840 | 0.005 |
ThemeNinjago | 3.706 | 1.084 | 3.420 | 0.001 |
ThemeDC Comics Super Heroes | 4.625 | 1.841 | 2.512 | 0.012 |
ThemeTechnic | 4.950 | 1.570 | 3.152 | 0.002 |
ThemeMarvel Super Heroes | 5.313 | 1.608 | 3.304 | 0.001 |
ThemeDisney | 5.802 | 2.403 | 2.415 | 0.016 |
ThemeMinecraft | 6.072 | 2.422 | 2.507 | 0.012 |
ThemeThe LEGO Batman Movie | 6.265 | 2.896 | 2.164 | 0.031 |
ThemeHERO Factory | 6.328 | 1.508 | 4.197 | 0.000 |
ThemeThe Hobbit | 7.125 | 2.997 | 2.377 | 0.018 |
ThemeStar Wars | 7.972 | 0.890 | 8.959 | 0.000 |
ThemeDC Super Hero Girls | 8.028 | 4.037 | 1.989 | 0.047 |
ThemeJuniors | 8.251 | 2.104 | 3.922 | 0.000 |
ThemeCity | 10.600 | 0.884 | 11.988 | 0.000 |
ThemeDimensions | 15.777 | 1.629 | 9.686 | 0.000 |