Gilgamath

R in Finance 2017

Fri 19 May 2017 by Steven

Review of R in Finance 2017 conference

Another Thalesians Talk

Tue 14 March 2017 by Steven

Matt Dixon 40th Birthday Talk

Earlier this year, I participated in the Winton Stock Market Challenge on Kaggle. I wanted to explore the freely available tools in R for performing what I had routinely done in Matlab in my previous career, I was curious how a large investment management firm (and Kagglers) approached this problem, and I wanted to be eyewitness to a potential overfitting disaster, should one occur.

The setup should be familiar: for selected date, stock pairs you are given 25 state variables, the two previous days of returns, and the first 120 minutes of returns. You are to predict the remaining 60 minutes of returns of that day and the following two days of returns for the stock. The metric used to score your predictions is a weighted mean absolute error, where presumably higher volatility names are downweighted in the final error metric. The training data consist of 40K observations, while the test data consist of 120K rows, for which one had to produce 744K predictions. First prize was a cool $20K. In addition to the prizes, Winton was explicitly looking for resumes.

I suspected that this competition would provide valuable data in my study of human overfitting of trading strategies. Towards that end, let us gather the public and private leaderboards. Recall that the public leaderboard is what participants see of their submissions during the competition period, based on around one quarter of the test set data, while the private leaderboard is the score of predictions on the remaining part of the test data, and is published in a big reveal at the close of the competition. Let's gather the leaderboard data.
(Those of you who want to play along at home can download my cut of the data.)

library(dplyr)
library(rvest)

# a function to load and process a leaderboard …

Gilgamath

R in Finance 2017

Another Thalesians Talk

Overfit Like a Pro

R in Finance 2016