Lego Pricing.
It is time to get kiddo a new Lego set, as he's been on a bender this week, building everything he can get his hands on. I wanted to optimize play time per dollar spent, so I set out to look for Lego pricing data.
Not surprisingly, there are a number of good sources for this
data. The best I found was at brickset.
Sign up for an account, then go to their
query builder. I built a query
requesting all sets from 2011 onwards, then selected the CSV
option,
copied the data into my clipboard, then dumped it via
xclip -o > brickset_db.csv
.
The brickset data is updated over time, so there's no reason to
prefer my file to one you download yourself.
First I load the data in R, filter based on availability of Piece and Price data, then remove certain themes (Books, Duplo, and so on). I then subselect themes based on having a large range of prices and of number of pieces:
library(readr)
library(dplyr)
indat <- readr::read_csv('../data/brickset_db.csv') %>%
select(Number,Theme,Subtheme,Year,Name,Pieces,USPrice)
## Rows: 4843 Columns: 18
## -- Column specification ----------------------------------------------------------------------------------------------------------------------------------
## Delimiter: ","
## chr (7): Number, Theme, Subtheme, Name, ImageURL, Owned, Wanted
## dbl (10): SetID, Variant, Year, Minifigs, Pieces, UKPrice, USPrice, CAPrice,...
## lgl (1): Rating
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
subdat <- indat %>%
filter(!is.na(Pieces),Pieces >= 10,
!is.na(USPrice),USPrice > 1,
!grepl('^(Books|Mindstorms|Duplo|.+Minifigures|Power Func|Games|Education|Serious)',Theme))
subok <- subdat %>%
group_by(Theme) %>%
summarize(many_sets=(sum(!is.nan(USPrice)) >= 10),
piece_spread=((max(Pieces) / min(Pieces)) >= 5),
price_spread=((max(USPrice) / min(USPrice)) >= 4)) %>%
ungroup() %>%
filter(many_sets & piece_spread & price_spread) %>%
select(-many_sets,-piece_spread,-price_spread)
subdat <- subdat %>%
inner_join(subok,by='Theme …