Is it Blockbuster Season?

Tue 28 June 2016 by Steven E. Pav

I recently released a docker-compose-based 'solution' to creating an IMDb mirror. This was one by-product of my ill-fated foray into Hollywood. The ETL process: removes TV shows, straight-to-video, porn, and most hobby projects from the larger IMDb FTP dump; uses to stuff the data into a database; then converts some of the text-based data into numeric data. For sanity checking, and to illustrate basic usage, I look here at seasonality of gross box office receipts. read more

Overfit Like a Pro

Tue 24 May 2016 by Steven E. Pav

Earlier this year, I participated in the Winton Stock Market Challenge on Kaggle. I wanted to explore the freely available tools in R for performing what I had routinely done in Matlab in my previous career, I was curious how a large investment management firm (and Kagglers) approached this problem, and I wanted to be eyewitness to a potential overfitting disaster, should one occur. read more

R in Finance 2016

Fri 20 May 2016 by Steven

Review of R in Finance 2016 conference

read more

Getting Hired as a Data Scientist

Thu 19 May 2016 by Steven

A few months back I wrote about my experiences trying to hire a data scientist. It took some amount of work on our part. When we finally found the right candidate, our parent company told us that there wasn't actually any money to pay a candidate. This came as rather a surprise to all of us at our three person startup. This was the first indication that the wheels were coming off the bus, and two months later, we were all laid off and the company dissolved. Within just three months I went from hiring to scrambling for a job. Would I follow my own advice for job candidates? What's the startup climate like? Is it easy to find a job in the field? read more