Gilgamath



Another Confidence Limit for the Markowitz Signal Noise ratio

Wed 28 March 2018 by Steven

Another confidence limit on the Signal Noise ratio of the Markowitz portfolio.

read more

Markowitz Portfolio Covariance, Elliptical Returns

Mon 12 March 2018 by Steven E. Pav

In a previous blog post, I looked at asymptotic confidence intervals for the Signal to Noise ratio of the (sample) Markowitz portfolio, finding them to be deficient. (Perhaps they are useful if one has hundreds of thousands of days of data, but are otherwise awful.) Those confidence intervals came from revision four of my paper on the Asymptotic distribution of the Markowitz Portfolio. In that same update, I also describe, albeit in an obfuscated form, the asymptotic distribution of the sample Markowitz portfolio for elliptical returns. Here I check that finding empirically.

Suppose you observe a \(p\) vector of returns drawn from an elliptical distribution with mean \(\mu\), covariance \(\Sigma\) and 'kurtosis factor', \(\kappa\). Three times the kurtosis factor is the kurtosis of marginals under this assumed model. It takes value \(1\) for a multivariate normal. This model of returns is slightly more realistic than multivariate normal, but does not allow for skewness of asset returns, which seems unrealistic.

Nonetheless, let \(\hat{\nu}\) be the Markowitz portfolio built on a sample of \(n\) days of independent returns:

$$ \hat{\nu} = \hat{\Sigma}^{-1} \hat{\mu}, $$

where \(\hat{\mu}, \hat{\Sigma}\) are the regular 'vanilla' estimates of mean and covariance. The vector \(\hat{\nu}\) is, in a sense, over-corrected, and we need to cancel out a square root of \(\Sigma\) (the population value). So we will consider the distribution of \(Q \Sigma^{\top/2} \hat{\nu}\), where \(\Sigma^{\top/2}\) is the upper triangular Cholesky factor of \(\Sigma\), and where \(Q\) is an orthogonal matrix (\(Q Q^{\top} = I\)), and where \(Q\) rotates \(\Sigma^{-1/2}\mu\) onto \(e_1\), the first basis vector:

$$ Q \Sigma^{-1/2}\mu = \zeta e_1, $$

where \(\zeta\) is the Signal to Noise ratio of the population Markowitz portfolio: \(\zeta = \sqrt{\mu^{\top}\Sigma^{-1}\mu} = \left\Vert …

read more

A Lack of Confidence Interval

Thu 15 February 2018 by Steven E. Pav

For some years now I have been playing around with a certain problem in portfolio statistics: suppose you observe \(n\) independent observations of a \(p\) vector of returns, then form the Markowitz portfolio based on those returns. What then is the distribution of what I call the 'signal to noise ratio' of that Markowitz portfolio, defined as the true expected return divided by the true volatility. That is, if \(\nu\) is the Markowitz portfolio, built on a sample, its 'SNR' is \(\nu^{\top}\mu / \sqrt{\nu^{\top}\Sigma \nu}\), where \(\mu\) is the population mean vector, and \(\Sigma\) is the population covariance matrix.

This is an odd problem, somewhat unlike classical statistical inference because the unknown quantity, the SNR, depends on population parameters, but also the sample. It is random and unknown. What you learn in your basic statistics class is inference on fixed unknowns. (Actually, I never really took a basic statistics class, but I think that's right.)

Paulsen and Sohl made some progress on this problem in their 2016 paper on what they call the Sharpe Ratio Information Criterion. They find a sample statistic which is unbiased for the portfolio SNR when returns are (multivariate) Gaussian. In my mad scribblings on the backs of envelopes and scrap paper, I have been trying to find the distribution of the SNR. I have been looking for this love, as they say, in all the wrong places, usually hoping for some clever transformation that will lead to a slick proof. (I was taught from a young age to look for slick proofs.)

Having failed that mission, I pivoted to looking for confidence intervals for the SNR (and maybe even prediction intervals on the out-of-sample Sharpe ratio of the in-sample Markowitz portfolio). I realized that some of the work I had done …

read more

Overfit Like a Pro

Tue 24 May 2016 by Steven E. Pav

Earlier this year, I participated in the Winton Stock Market Challenge on Kaggle. I wanted to explore the freely available tools in R for performing what I had routinely done in Matlab in my previous career, I was curious how a large investment management firm (and Kagglers) approached this problem, and I wanted to be eyewitness to a potential overfitting disaster, should one occur.

The setup should be familiar: for selected date, stock pairs you are given 25 state variables, the two previous days of returns, and the first 120 minutes of returns. You are to predict the remaining 60 minutes of returns of that day and the following two days of returns for the stock. The metric used to score your predictions is a weighted mean absolute error, where presumably higher volatility names are downweighted in the final error metric. The training data consist of 40K observations, while the test data consist of 120K rows, for which one had to produce 744K predictions. First prize was a cool $20K. In addition to the prizes, Winton was explicitly looking for resumes.

I suspected that this competition would provide valuable data in my study of human overfitting of trading strategies. Towards that end, let us gather the public and private leaderboards. Recall that the public leaderboard is what participants see of their submissions during the competition period, based on around one quarter of the test set data, while the private leaderboard is the score of predictions on the remaining part of the test data, and is published in a big reveal at the close of the competition. Let's gather the leaderboard data.
(Those of you who want to play along at home can download my cut of the data.)

library(dplyr)
library(rvest)

# a function to load and process a leaderboard …
read more