Conditional Portfolios

When I first started working at a quant fund I tried to read about portfolio theory. (Beyond, you know, "Hedge Funds for Dummies.") I learned about various objectives and portfolio constraints, including the Markowitz portfolio, which felt very natural. Markowitz solves the mean-variance optimization problem, as well as the Sharpe maximization problem, namely

$$ \operatorname{argmax}_w \frac{w^{\top}\mu}{\sqrt{w^{\top} \Sigma w}}. $$

This is solved, up to scaling, by the Markowitz portfolio \(\Sigma^{-1}\mu\).

When I first read about the theory behind Markowitz, I did not read anything about where \(\mu\) and \(\Sigma\) come from. I assumed the authors I was reading were talking about the vanilla sample estimates of the mean and covariance, though the theory does not require this.

There are some problems with the Markowitz portfolio. For us, as a small quant fund, the most pressing issue was that holding the Markowitz portfolio based on the historical mean and covariance was not a good look. You don't get paid "2 and twenty" for computing some long term averages.

Rather than holding an unconditional portfolio, we sought to construct a conditional one, conditional on some "features". (I now believe this topic falls under the rubric of "Tactical Asset Allocation".) We stumbled on two simple methods for adapting Markowitz theory to accept conditioning information: Conditional Markowitz, and "Flattening".

Conditional Markowitz

Suppose you observe some \(l\) vector of features, \(f_i\) prior to the time you have to allocate into \(p\) assets to enjoy returns \(x_i\). Assume that the returns are linear in the features, but the covariance is a long term average. That is

$$ E\left[x_i \left|f_i\right.\right] = B f_i,\quad\mbox{Var}\left(x_i \left|f_i\right.\right) = \Sigma. $$

Note that Markowitz theory never really said how to estimate mean returns, and thus the conditional expectation here can be used directly in the Markowitz portfolio definition. Thus the conditional Markowitz portfolio, conditional on observing \(f_i\) is simply \(\Sigma^{-1} B f_i\). Another way of viewing this is to estimate the "Markowitz coefficient", \(W=\Sigma^{-1} B\) and just multiply this by \(f_i\) when it is observed.

I have written about inference on the conditional Markowitz portfolio: via the MGLH tests one can test essentially whether \(W\) is all zeros, or test the total effect size. However, the conditional Markowitz procedure is, like the unconditional procedure, subject to the Cramer Rao portfolio bounds in the 'obvious' way: increasing the number of fit coefficients faster than the signal-noise ratio can cause degraded out-of-sample performance.

The Flattening Trick

The other approach for adding conditional information is slicker. When I first reinvented it, I called it the "flattening trick". I assumed it was well established in the folklore of the quant community, but I have only found one reference to it, a paper by Brandt and Santa Clara, where they refer to it as "augmenting the asset space".

The idea is as follows: in the conditional Markowitz procedure we ended with a matrix \(W\) such that, conditional on \(f_i\) we would hold portfolio \(W f_i\). Why not just start with the assumption that you seek a portfolio that is linear in \(f_i\) and optimize the \(W\)? Note that the returns you experience by holding \(W f_i\) is exactly

$$ x_i^{\top} W f_i = \operatorname{trace}\left(x_i^{\top} W f_i\right) = \operatorname{trace}\left(f_i x_i^{\top} W\right) = \operatorname{vec}^{\top}\left(f_i x_i^{\top}\right) \operatorname{vec}\left(W\right), $$

where \(\operatorname{vec}\) is the vectorization operator that take a matrix to a vector columnwise. I called this "flattening," but maybe it's more like "unravelling".

Now note that the optimization problem you are trying to solve is to find the vector \(\operatorname{vec}\left(W\right)\), with pseudo-returns of \(y_i = \operatorname{vec}\left(f_i x_i^{\top}\right)\).
You can simply construct these pseudo returns \(y_i\) from your historical data, and feed them into an unconditional portfolio process. You can use unconditional Markowitz for this, or any other unconditional procedure. Then take the results of the unconditional process and unflatten them back to \(W\).

Note that even when you use unconditional Markowitz on the flattened problem, you will not regain the \(W\) from conditional Markowitz. The reason is that we are essentially allowing the covariance of returns to vary with our features as well, which was not possible in conditional Markowitz. In practice we often found that flattening trick had slightly worse out-of-sample performance than conditional Markowitz when used on the same data, which we broadly attributed to overfitting. In conditional Markowitz we would estimate the \(p \times l\) matrix \(B\) and the \(p \times p\) matrix \(\Sigma\), to arrive at \(p \times l\) matrix \(W\). In flattening plus unconditional Markowitz you estimate a \(pl\) vector of means, and the \(pl \times pl\) matrix of covariance to arrive at the \(p \times l\) matrix \(W\).

To mitigate the overfitting, it is fairly easy to add sparsity to the flattening trick. If you wish to force an element of \(W\) to be zero, because you think a certain feature should have no bearing on your holdings of a certain asset, you can just elide it from the flattening pseudo returns. Moreover, if you feel that certain feature should only have, say, a positive influence on your holdings of a particular asset, you can directly impose that positivity constraint in the pseudo portfolio optimization problem. Because you are solving directly for elements of \(W\), this is much easier than in conditional Markowitz where \(W\) is the product of two matrices.

Flattening is a neat trick. You should consider it the next time you're allocating assets tactically.