Is it Blockbuster Season?
Tue 28 June 2016
by Steven E. Pav
I recently released a
docker-compose-based 'solution' to creating
an IMDb mirror. This was one
by-product of my ill-fated foray into Hollywood. The ETL process:
removes TV shows, straight-to-video, porn, and most hobby projects from
the larger IMDb FTP dump; uses
imdb2sql.py to stuff the data into a database;
then converts some of the text-based data into numeric data.
For sanity checking, and to illustrate basic usage, I look here
at seasonality of gross box office receipts.
Overfit Like a Pro
Tue 24 May 2016
by Steven E. Pav
Earlier this year, I participated in the
Winton Stock Market Challenge
on Kaggle. I wanted to explore the freely available
tools in R for performing what I had routinely done in Matlab
in my previous career, I was curious how a large
investment management firm (and Kagglers)
approached this problem, and I wanted to be eyewitness to a potential
overfitting disaster, should one occur.
R in Finance 2016
Fri 20 May 2016
Review of R in Finance 2016 conference
Getting Hired as a Data Scientist
Thu 19 May 2016
A few months back I wrote about my experiences trying to
hire a data scientist. It took some amount of work on our part. When we finally
found the right candidate, our parent company told us that there wasn't actually any
money to pay a candidate. This came as rather a surprise to all of us at our three person
startup. This was the first indication that the wheels were coming off the bus, and two
months later, we were all laid off and the company dissolved. Within just three months
I went from hiring to scrambling for a job. Would I follow my own advice for job
candidates? What's the startup climate like? Is it easy to find a job in the field?