Gilgamathhttps://www.gilgamath.com/2020-01-04T21:07:06-08:00Nonparametric Market Timing2020-01-04T21:07:06-08:002020-01-04T21:07:06-08:00Steventag:www.gilgamath.com,2020-01-04:/nonparametric_market_timing.html<p>Market timing a single instrument with a single feature</p><p>In a <a href="market_timing">previous blog post</a>, I looked at "market timing" for discrete states.
There are a number of ways that result can be generalized.
Here we consider a non-parametric view.
Suppose you observe some scalar feature <span class="math">\(f_t\)</span> prior to the time required
to invest and capture scalar returns <span class="math">\(x_{t+1}\)</span>.
Let <span class="math">\(\mu\left(f\right)\)</span> and <span class="math">\(\alpha_2\left(f\right)\)</span> be the first and second moments
of returns conditional on the feature:
</p>
<div class="math">$$
E\left[\left.x_{t+1}\right|f_t\right] = \mu\left(f_t\right),\quad
E\left[\left.x^2_{t+1}\right|f_t\right] = \alpha_2\left(f_t\right).
$$</div>
<p>
Suppose that <span class="math">\(f_t\)</span> is random with density <span class="math">\(g\left(f\right)\)</span>.</p>
<p>Suppose that in response to observing <span class="math">\(f_t\)</span> you allocate <span class="math">\(w\left(f_t\right)\)</span>
proportion of your wealth long in the asset.
The first and second moments of the returns of this strategy are
</p>
<div class="math">$$
\int \mu\left(x\right) w\left(x\right) g\left(x\right)dx,\quad\mbox{and }
\int \alpha_2\left(x\right) w^2\left(x\right) g\left(x\right)dx.
$$</div>
<!-- PELICAN_END_SUMMARY -->
<p>Now we seek the strategy <span class="math">\(w\left(x\right)\)</span> that maximizes the signal-noise
ratio, which is the ratio of the expected return to the standard devation
of returns.
We can transform this metric to the ratio of expected return to the
square root of the second moment, by way of the monotonic 'tas' function (the tangent
of the arcsine of the return to square root second moment is the signal-noise ratio).
Now note that to maximize this ratio we can, without loss of generality,
prespecify the denominator to equal some value.
This works because the ratio is homogeneous of order zero, and we can rescale
<span class="math">\(w\left(x\right)\)</span> by some arbitrary positive constant and get the same
objective. This yields the problem
</p>
<div class="math">$$
\max_{w\left(x\right)} \int \mu\left(x\right) w\left(x\right) g\left(x\right)dx,\quad\mbox{subject to }\,
\int \alpha_2\left(x\right) w^2\left(x\right) g\left(x\right)dx = 1.
$$</div>
<p>This, it turns out, is a trivial problem in the calculus of variations.
Trivial in the sense that the integrals do not involve the derivative
<span class="math">\(w'\left(x\right)\)</span> and so the solution has a simple form which looks
just like the finite dimensional Lagrange multiplier solution.
After some simplification, the optimal solution is found to be
</p>
<div class="math">$$
w\left(x\right) = \frac{c \mu\left(x\right)}{\alpha_2\left(x\right)}.
$$</div>
<p>
Note this is fully consistent with what we saw for the case where
<span class="math">\(f_t\)</span> took one of a finite set of discrete states
in our <a href="market_timing">earlier blog post</a>.
However, this doesn't quite look like Markowitz, because the
denominator has the second moment function, and not the
variance function.
We will see this actually matters.</p>
<h2>Exponential Heteroskedasticity</h2>
<p>As an example, consider the case where <span class="math">\(f_t\)</span> takes an exponential distribution
with parameter <span class="math">\(\lambda=1\)</span>.
Moreover, assume the mean is constant, but the variance is proportional to the feature:
</p>
<div class="math">$$
E\left[\left.x_{t+1}\right|f_t\right] = \mu,\quad
Var\left(\left.x^2_{t+1}\right|f_t\right) = f_t \sigma^2.
$$</div>
<p>
The optimal allocation is
</p>
<div class="math">$$
w\left(f_t\right) = \frac{c \mu}{\sigma^2\left(f_t + \zeta^2\right)} = \frac{c'}{f_t + \zeta^2},
$$</div>
<p>
where <span class="math">\(\zeta=\mu/\sigma\)</span>.
We note that because <span class="math">\(E\left[f_t\right]=1\)</span> and the expected return is constant
with respect to <span class="math">\(f_t\)</span>, the signal-noise ratio of the buy-and-hold strategy
is simply <span class="math">\(\zeta\)</span>.
The SNR of the optimal timing strategy can be quite a bit higher.</p>
<p>To compute that SNR, first let
</p>
<div class="math">$$
q=\zeta^2 \exp{\left(\zeta^2\right)}\int_{\zeta^2}^{\infty} \frac{\exp{\left(-x\right)}}{x}dx.
$$</div>
<p>
(This integral is called the "exponential integral".)
Then the SNR of the timing strategy is
</p>
<div class="math">$$
\operatorname{sign}\left(c\right)\sqrt{\frac{q}{1-q}}.
$$</div>
<p>
Here we confirm this empirically. We spawn a bunch of <span class="math">\(f_t\)</span> and <span class="math">\(x_{t+1}\)</span> under the model,
then compute the returns of the buy-and-hold strategy, the optimal strategy,
and the Markowitz equivalent which holds proportional to mean divided by variance:</p>
<div class="highlight"><pre><span></span>mu <span class="o"><-</span> <span class="m">0.1</span>
sg <span class="o"><-</span> <span class="m">1</span>
zetasq <span class="o"><-</span> <span class="p">(</span>mu<span class="o">/</span>sg<span class="p">)</span><span class="o">^</span><span class="m">2</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
feat <span class="o"><-</span> rexp<span class="p">(</span><span class="m">1e6</span><span class="p">,</span>rate<span class="o">=</span><span class="m">1</span><span class="p">)</span>
rets <span class="o"><-</span> rnorm<span class="p">(</span><span class="kp">length</span><span class="p">(</span>feat<span class="p">),</span>mean<span class="o">=</span><span class="kp">rep</span><span class="p">(</span>mu<span class="p">,</span><span class="kp">length</span><span class="p">(</span>feat<span class="p">)),</span>sd<span class="o">=</span>sg<span class="o">*</span><span class="kp">sqrt</span><span class="p">(</span>feat<span class="p">))</span>
<span class="c1"># optimal allocation; </span>
ww <span class="o"><-</span> <span class="m">1</span> <span class="o">/</span> <span class="p">(</span>feat<span class="o">+</span>zetasq<span class="p">)</span>
<span class="c1"># markowitz allocation;</span>
mw <span class="o"><-</span> <span class="m">1</span> <span class="o">/</span> feat
<span class="kn">library</span><span class="p">(</span>SharpeR<span class="p">)</span>
buyhold <span class="o"><-</span> <span class="p">(</span>as.sr<span class="p">(</span>rets<span class="p">)</span><span class="o">$</span>sr<span class="p">)</span>
optimal <span class="o"><-</span> <span class="p">(</span>as.sr<span class="p">(</span>rets<span class="o">*</span>ww<span class="p">)</span><span class="o">$</span>sr<span class="p">)</span>
markwtz <span class="o"><-</span> <span class="p">(</span>as.sr<span class="p">(</span>rets<span class="o">*</span>mw<span class="p">)</span><span class="o">$</span>sr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>expint<span class="p">)</span>
qfunc <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>zetsq<span class="p">)</span> <span class="p">{</span>
<span class="kn">require</span><span class="p">(</span>expint<span class="p">,</span>quietly<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
zetsq <span class="o">*</span> <span class="kp">exp</span><span class="p">(</span>zetsq<span class="p">)</span> <span class="o">*</span> expint<span class="p">(</span>zetsq<span class="p">)</span>
<span class="p">}</span>
psnrfunc <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>zetsq<span class="p">)</span> <span class="p">{</span>
qqq <span class="o"><-</span> qfunc<span class="p">(</span>zetsq<span class="p">)</span>
<span class="kp">sqrt</span><span class="p">(</span>qqq <span class="o">/</span> <span class="p">(</span><span class="m">1</span><span class="o">-</span>qqq<span class="p">))</span>
<span class="p">}</span>
theoretical <span class="o"><-</span> psnrfunc<span class="p">(</span>zetasq<span class="p">)</span>
</pre></div>
<p>The empirical SNR of the buy-and-hold strategy is 0.1, which is very
close to the theoretical value of 0.1.
We compute the SNR of the optimal strategy to be 0.207, which is very
close to the theoretical value we compute as 0.206 using
the exponential integral above.
The signal-noise ratio of the Markowitz strategy, however, is a measly
0.0152. </p>
<p>We note that for this setup, it is simple to find the optimal <span class="math">\(k\)</span> degree polynomial <span class="math">\(w\left(f_t\right)\)</span>,
and confirm they have lower SNR than what we observe here. We leave that as an exercise in our book.</p>
<h2>Timing the Market</h2>
<p>Here we use this technique on returns of the Market, as defined in the Fama-French factors.
We take the 12 month rolling volatility of the Market returns, delayed by a month, as our feature.
First we plot the market returns and squared market returns as a function of our feature.
We see essentially a flat <span class="math">\(\mu\)</span> but an increasing <span class="math">\(\alpha_2\)</span>.</p>
<div class="highlight"><pre><span></span><span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>fromo<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tidyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>magrittr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
<span class="p">})</span>
<span class="kr">if</span> <span class="p">(</span><span class="o">!</span><span class="kn">require</span><span class="p">(</span>aqfb.data<span class="p">)</span> <span class="o">&&</span> <span class="kn">require</span><span class="p">(</span>devtools<span class="p">))</span> <span class="p">{</span>
devtools<span class="o">::</span>install_github<span class="p">(</span><span class="s">'shabbychef/aqfb_data'</span><span class="p">)</span>
<span class="p">}</span>
<span class="kn">library</span><span class="p">(</span>aqfb.data<span class="p">)</span>
data<span class="p">(</span>mff4<span class="p">)</span>
df <span class="o"><-</span> <span class="kt">data.frame</span><span class="p">(</span>mkt<span class="o">=</span>mff4<span class="o">$</span>Mkt<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>vol12<span class="o">=</span><span class="kp">as.numeric</span><span class="p">(</span>fromo<span class="o">::</span>running_sd<span class="p">(</span>Mkt<span class="p">,</span><span class="m">12</span><span class="p">,</span>min_df<span class="o">=</span><span class="m">12L</span><span class="p">)))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>feature<span class="o">=</span>dplyr<span class="o">::</span>lag<span class="p">(</span>vol12<span class="p">,</span><span class="m">2</span><span class="p">))</span> <span class="o">%>%</span>
dplyr<span class="o">::</span>filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>feature<span class="p">))</span>
<span class="c1"># check on the first moments</span>
ph <span class="o"><-</span> df <span class="o">%>%</span>
rename<span class="p">(</span>mu<span class="o">=</span>Mkt<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>alpha_2<span class="o">=</span>mu<span class="o">^</span><span class="m">2</span><span class="p">)</span> <span class="o">%>%</span>
tidyr<span class="o">::</span>gather<span class="p">(</span>key<span class="o">=</span>series<span class="p">,</span>value<span class="o">=</span>value<span class="p">,</span>mu<span class="p">,</span>alpha_2<span class="p">)</span> <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>feature<span class="p">,</span>value<span class="p">))</span> <span class="o">+</span>
geom_point<span class="p">()</span> <span class="o">+</span>
stat_smooth<span class="p">()</span> <span class="o">+</span>
facet_grid<span class="p">(</span>series<span class="o">~</span><span class="m">.</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'12 month vol, lagged one month'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'mean or second moment'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'Returns of the Market'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/nonparametric_market_timing_check_market-1.png" title="plot of chunk check_market" alt="plot of chunk check_market" width="1000px" height="500px" /></p>
<p>We now perform a GAM fit on the first and second moments of the Market returns.
I have to use a trick to force the second moment estimate to be positive.
I plot the optimal allocation versus the feature below.
Note that it vaguely resembles the optimal allocation from
the exponential heteroskedasticity toy example above.
One could also estimate the SNR one would achieve in this case,
but that ignores the effects of any estimation error.
Moreover, the multi-period SNR that we compute here might be
considered a very long term average, something that might not
be terribly noticeable on a short time scale.</p>
<div class="highlight"><pre><span></span><span class="c1"># do two fits</span>
<span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>mgcv<span class="p">)</span>
<span class="p">})</span>
spn <span class="o"><-</span> <span class="m">0.9</span>
mufunc <span class="o"><-</span> mgcv<span class="o">::</span>gam<span class="p">(</span>Mkt <span class="o">~</span> feature<span class="p">,</span>data<span class="o">=</span>df<span class="p">,</span>family<span class="o">=</span>gaussian<span class="p">())</span>
a2func <span class="o"><-</span> mgcv<span class="o">::</span>gam<span class="p">(</span><span class="kp">I</span><span class="p">(</span><span class="kp">log</span><span class="p">(</span><span class="kp">pmax</span><span class="p">(</span><span class="m">1e-6</span><span class="p">,</span>Mkt<span class="o">^</span><span class="m">2</span><span class="p">)))</span> <span class="o">~</span> feature<span class="p">,</span>data<span class="o">=</span>df<span class="p">,</span>family<span class="o">=</span>gaussian<span class="p">())</span>
alloc <span class="o"><-</span> tibble<span class="p">(</span>feature<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="kp">min</span><span class="p">(</span>df<span class="o">$</span>feature<span class="p">)</span><span class="o">*</span><span class="m">1.05</span><span class="p">,</span><span class="kp">max</span><span class="p">(</span>df<span class="o">$</span>feature<span class="p">)</span><span class="o">*</span><span class="m">0.95</span><span class="p">,</span>length.out<span class="o">=</span><span class="m">501</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>wts<span class="o">=</span>predict<span class="p">(</span>mufunc<span class="p">,</span><span class="m">.</span><span class="p">)</span> <span class="o">/</span> <span class="kp">exp</span><span class="p">(</span>predict<span class="p">(</span>a2func<span class="p">,</span><span class="m">.</span><span class="p">)))</span>
<span class="c1"># if you wanted to estimate the SNR of this allocation:</span>
df2 <span class="o"><-</span> df <span class="o">%>%</span>
mutate<span class="p">(</span>wts<span class="o">=</span>predict<span class="p">(</span>mufunc<span class="p">,</span><span class="m">.</span><span class="p">)</span> <span class="o">/</span> <span class="kp">exp</span><span class="p">(</span>predict<span class="p">(</span>a2func<span class="p">,</span><span class="m">.</span><span class="p">)))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>ret<span class="o">=</span>Mkt <span class="o">*</span> wts<span class="p">)</span>
zetfoo <span class="o"><-</span> as.sr<span class="p">(</span>df2<span class="o">$</span>ret<span class="p">,</span>ope<span class="o">=</span><span class="m">12</span><span class="p">)</span>
ph <span class="o"><-</span> alloc <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>feature<span class="p">,</span>wts<span class="p">))</span> <span class="o">+</span>
geom_line<span class="p">()</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'12 month vol, lagged one month'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'optimal allocation, up to scaling'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'Timing the Market'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/nonparametric_market_timing_optimal_allocation-1.png" title="plot of chunk optimal_allocation" alt="plot of chunk optimal_allocation" width="1000px" height="500px" /></p>
<h2>Checking on leverage</h2>
<p>One odd way to use this nonparametric market timing trick in quantitative trading
(though do not take this as investing advice!)
is as a kind of check on the leverage of a strategy that levers itself.
That is, suppose you have some kind of quantitative strategy that does not always use all the capital allocated to it.
Let <span class="math">\(f_t\)</span> be the proportion of wealth that the strategy 'decides' to allocate.
Of course this is observable prior to the investment decision.
Then estimate, nonparametrically, the first and second moment of the returns of the strategy
<em>on full leverage</em> from historical returns.
Then compute the optimal leverage as a function of the allocated leverage,
and plot one against the other: they should fall on a straight line!
If they do not fall on a straight line, the strategy is not making
optimal decisions regarding leverage (modulo estimation error).</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>ohenery!2019-09-25T21:32:52-07:002019-09-25T21:32:52-07:00Steventag:www.gilgamath.com,2019-09-25:/ohenery.html<p>ohenery package to CRAN</p><p>I just pushed the first version of my
<a href="http://github.com/shabbychef/ohenery"><code>ohenery</code></a>
package to
<a href="https://cran.r-project.org/package=ohenery">CRAN</a>.
The package supports estimation of softmax regression
for ordinal outcomes under the Harville and Henery models.
Unlike the usual multinomial representation for ordinal
outcomes, softmax regression is useful for 'ragged'
cases. Contrast:</p>
<ul>
<li>observed independent variables on participants
in multiple races, with the outcomes recorded,
and different participants in each race,
perhaps different numbers of participants in
each race. </li>
<li>observed independent variables on independent
trials where for each trial there is a single
outcome taking values from some ordered
set.</li>
</ul>
<p>Multinomial ordinal regression is for the latter
case, while softmax is for the former.
It generalizes logistic regression.
I had first stumbled on the idea when
<a href="best-picture-data">working in the film industry</a>,
but called it a
<a href="taste-preferences">'Bradley-Terry model'</a>
out of ignorance.</p>
<!-- PELICAN_END_SUMMARY -->
<p>The basic setup is as follows: suppose you observe independent variables
<span class="math">\(x_i\)</span> for a participant in a race.
Let <span class="math">\(\eta_i = x_i^{\top}\beta\)</span> for some coefficients <span class="math">\(\beta\)</span>.
Then let
</p>
<div class="math">$$
\pi_i = \frac{\exp{\eta_i}}{\sum_j \exp{\eta_j}},
$$</div>
<p>
where we sum over all <span class="math">\(j\)</span> in the same race.
Under the softmax regression model, the probability that participant <span class="math">\(i\)</span>
takes first place is <span class="math">\(\pi_i\)</span>.</p>
<p>This formulation is sufficient when you only observe the winner
of a multi-participant race, like say the Best Picture winner
of the Oscars.
However, in some cases you observe the rank of several or all
participants.
For example, in Olympic events, one observes Gold, Silver and Bronze
finishes.</p>
<p>Note that it is generally recommended that you <em>not</em> discard
continuous information to dichotomize your variables in this way.
However, in some cases one only observes the ordinal outcomes.
In this case softmax regression can be used.</p>
<p>In the case where ranked outcomes are observed beyond
the winner, we wish to 'recycle' softmax probabilities.
Under the Harville model, the probabilities are recycled
proportionally. An example will illustrate:
condition on the outcome that participant 11 took first
place. Then for <span class="math">\(i \ne 11\)</span>, compute
</p>
<div class="math">$$
\pi_i = \frac{\exp{\eta_i}}{\sum_{j\ne 11} \exp{\eta_j}}.
$$</div>
<p>
Under the Harville model, the probability that the <span class="math">\(i\)</span>th
participant took <em>second</em> place is <span class="math">\(\pi_i\)</span>, conditional
on the event that 11 took first.</p>
<p>The Henery model slightly generalizes the Harville model.
Here we imagine some <span class="math">\(\gamma_2, \gamma_3, \gamma_4\)</span>
and so on such that the above computation becomes
</p>
<div class="math">$$
\pi_i = \frac{\exp{\gamma_2 \eta_i}}{\sum_{j\ne 11} \exp{\gamma_2 \eta_j}}.
$$</div>
<p>
Then conditional on 11 taking first, and participant 5 taking second,
compute
</p>
<div class="math">$$
\pi_i = \frac{\exp{\gamma_3 \eta_i}}{\sum_{j\ne 11, j\ne 5} \exp{\gamma_3 \eta_j}}
$$</div>
<p>
as the probability that participant <span class="math">\(i\)</span> takes third place, and so on.
Obviously the Harville model is a Henery model with all <span class="math">\(\gamma_i=1\)</span>.</p>
<p>I wasn't sure how to deal with ties in the code.
On the one hand, ties are legitimate possible outcomes in some cases.
On the other, they are convenient to introduce as some unobserved
'runner up' status.
For example, create an 'Aluminum Medal' outcome for Olympians who
take neither Gold, Silver or Bronze;
in this case many participants tie for the fourth place medal.
However, we should not expect the regression to try to fit some
order on those participants.
The solution was to introduce weights to the estimation.
Set the weights to zero for outcomes which are fake ties,
and set them to one otherwise.</p>
<p>The package uses <code>Rcpp</code> to compute a likelihood (and gradient),
then <code>maxLik</code> does the estimation and inference.
The rest of the work was me tearing my hair out trying to
decipher <code>model.frame</code> and its friends.</p>
<h2>Olympic Diving</h2>
<p>The package is bundled with a dataset of 100 years of Olympic Men's
Platform Diving Records, sourced from Randi Griffin's
excellent <a href="https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results">dataset on kaggle</a>.</p>
<p>Here we convert the medal records into finishing places of 1, 2, 3 and 4 (no medal),
add weights for the fitting,
make a factor variable for age,
factor the NOC (country) of the athlete.
Because Platform Diving is a subjective competition, based on scores from
judges, we investigate whether there is a 'home field advantage'
by creating a Boolean variable indicating whether the athlete is representing
the host nation.</p>
<p>We then fit a Henery model to the data. Note that the gamma terms come
out very close to one, indicating the Harville model would be sufficient.
The home field advantage does not appear real in this analysis.
(<em>Note:</em> in the first draft of this blog post, using the first version
of the package, the home field effect appeared significant due to
coding error.)</p>
<div class="highlight"><pre><span></span><span class="c1"># this should be ohenery 0.1.1</span>
<span class="kn">library</span><span class="p">(</span>ohenery<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>forcats<span class="p">)</span>
data<span class="p">(</span>diving<span class="p">)</span>
fitdat <span class="o"><-</span> diving <span class="o">%>%</span>
mutate<span class="p">(</span>Finish<span class="o">=</span>case_when<span class="p">(</span><span class="kp">grepl</span><span class="p">(</span><span class="s">'Gold'</span><span class="p">,</span>Medal<span class="p">)</span> <span class="o">~</span> <span class="m">1</span><span class="p">,</span> <span class="c1"># make outcomes</span>
<span class="kp">grepl</span><span class="p">(</span><span class="s">'Silver'</span><span class="p">,</span>Medal<span class="p">)</span> <span class="o">~</span> <span class="m">2</span><span class="p">,</span>
<span class="kp">grepl</span><span class="p">(</span><span class="s">'Bronze'</span><span class="p">,</span>Medal<span class="p">)</span> <span class="o">~</span> <span class="m">3</span><span class="p">,</span>
<span class="kc">TRUE</span> <span class="o">~</span> <span class="m">4</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>weight<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>Finish <span class="o"><=</span> <span class="m">3</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>cut_age<span class="o">=</span><span class="kp">cut</span><span class="p">(</span>coalesce<span class="p">(</span>Age<span class="p">,</span><span class="m">22.0</span><span class="p">),</span><span class="kt">c</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="m">19.5</span><span class="p">,</span><span class="m">21.5</span><span class="p">,</span><span class="m">22.5</span><span class="p">,</span><span class="m">25.5</span><span class="p">,</span><span class="m">99</span><span class="p">),</span>include.lowest<span class="o">=</span><span class="kc">TRUE</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>country<span class="o">=</span>forcats<span class="o">::</span>fct_relevel<span class="p">(</span>forcats<span class="o">::</span>fct_lump<span class="p">(</span><span class="kp">factor</span><span class="p">(</span>NOC<span class="p">),</span>n<span class="o">=</span><span class="m">5</span><span class="p">),</span><span class="s">'Other'</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>home_advantage<span class="o">=</span>NOC<span class="o">==</span>HOST_NOC<span class="p">)</span>
hensm<span class="p">(</span>Finish <span class="o">~</span> cut_age <span class="o">+</span> country <span class="o">+</span> home_advantage<span class="p">,</span>data<span class="o">=</span>fitdat<span class="p">,</span>weights<span class="o">=</span>weight<span class="p">,</span>group<span class="o">=</span>EventId<span class="p">,</span>ngamma<span class="o">=</span><span class="m">3</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span>--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 43 iterations
Return code 0: successful convergence
Log-Likelihood: -214.01
12 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
cut_age(19.5,21.5] 0.0303 0.4185 0.07 0.94227
cut_age(21.5,22.5] -0.7276 0.5249 -1.39 0.16565
cut_age(22.5,25.5] 0.0950 0.3790 0.25 0.80199
cut_age(25.5,99] -0.1838 0.4111 -0.45 0.65474
countryGBR -0.6729 0.8039 -0.84 0.40258
countryGER 1.0776 0.4960 2.17 0.02981 *
countryMEX 0.7159 0.4744 1.51 0.13126
countrySWE 0.6207 0.5530 1.12 0.26172
countryUSA 2.3201 0.4579 5.07 4.1e-07 ***
home_advantageTRUE 0.5791 0.4112 1.41 0.15904
gamma2 1.0054 0.2853 3.52 0.00042 ***
gamma3 0.9674 0.2963 3.26 0.00109 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------
</pre></div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Discrete State Market Timing2019-06-30T10:22:58-07:002019-06-30T10:22:58-07:00Steventag:www.gilgamath.com,2019-06-30:/market-timing.html<p>Market timing with a discrete feature</p><p>In a <a href="portfolio-flattening">previous blog post</a> I talked about two methods
for dealing with conditioning information in portfolio construction.
Here I apply them both to the problem of <em>market timing with a discrete feature</em>.
Suppose that you have a single asset which you can trade long or short.
You observe some 'feature', <span class="math">\(f_i\)</span> prior to the time required to make an investment
decision to capture the returns <span class="math">\(x_i\)</span>. In this blog post we consider the case
where <span class="math">\(f_i\)</span> takes one of <span class="math">\(J\)</span> known discrete values.
(note: this subsumes the case where one observes a finite number of different
discrete features, since you can combine them into one discrete feature.)</p>
<!-- PELICAN_END_SUMMARY -->
<p>The features are not something we can manipulate, and usually we consider them
random (<em>e.g.</em> are we in a Bear or Bull market? are interest rates high or low? <em>etc.</em>),
or if not quite random, at least uncontrollable (<em>e.g.</em> what month is it? did the FOMC
just announce? <em>etc.</em>) </p>
<p>Denote the states by <span class="math">\(z_j\)</span>, and then assume that, conditional on <span class="math">\(f_i=z_j\)</span> the
expected value and variance of <span class="math">\(x_i\)</span> are, respectively, <span class="math">\(m_j\)</span> and <span class="math">\(s_j^2\)</span>.
Let the probability that <span class="math">\(f_i=z_j\)</span> be <span class="math">\(\pi_j\)</span>.
Suppose that conditional on observing <span class="math">\(f_i=z_j\)</span> you decide to hold <span class="math">\(w_j\)</span> of the
asset long or short, depending on the sign of <span class="math">\(w_j\)</span>.
The (long term) expected return of your strategy is
</p>
<div class="math">$$
\mu = \sum_{1\le j \le J} \pi_j w_j m_j,
$$</div>
<p>
and the (long term) variance of your returns are
</p>
<div class="math">$$
\sigma^2 = \sum_{1\le j \le J} \pi_j w_j^2 \left(s_j^2 + m_j^2\right) - \mu^2.
$$</div>
<p>Note that you can directly work with these equations.
For example, it is relatively easy to show that you can
maximize your signal-noise ratio, <span class="math">\(\mu/\sigma\)</span> by taking
</p>
<div class="math">$$
w_j = c \frac{m_j}{s_j^2 + m_j^2},
$$</div>
<p>
for some constant <span class="math">\(c\)</span> chosen to achieve some long term volatility target.
However, some of the analysis we might like to perform (are the weights
different for different <span class="math">\(j\)</span>? should we do this at all? <em>etc.</em>) is
hard here because we have to start from scratch.</p>
<h2>Flatten it!</h2>
<p>This is a textbook case for "flattening", whereby we turn a conditional
portfolio problem into an unconditional one.
Let <span class="math">\(y_{i,j} = \chi_{f_i = z_j} x_i\)</span> be the product of the indicator
for being in the <span class="math">\(j\)</span>th state, and the returns <span class="math">\(x_i\)</span>. Let <span class="math">\(w\)</span> be the
<span class="math">\(J\)</span>-vector of your portfolio weights <span class="math">\(w_j\)</span>. The return of your
strategy on the <span class="math">\(i\)</span>th period is <span class="math">\(y_{i,\cdot} w\)</span>.
Letting <span class="math">\(Y\)</span> be the matrix whose <span class="math">\(i,j\)</span>th element is <span class="math">\(y_{i,j}\)</span>,
you can perform naive Markowitz on the sample <span class="math">\(Y\)</span> to estimate <span class="math">\(w\)</span>.</p>
<p>But now you can easily perform inference: to see whether there
is any "there there", you can compute the squared sample Sharpe ratio
of the sample Markowitz portfolio, then essentially use Hotelling's
<span class="math">\(T^2\)</span> test.
More interesting, however, is whether there is any <em>additional</em>
gains to be had from market timing beyond the buy-and-hold
strategy.
This can be couched as the following portfolio optimization problem:
</p>
<div class="math">$$
\max_{w: g^{\top} \Sigma w = 0} \frac{w^{\top}\mu}{\sqrt{w^{\top}\Sigma w}},
$$</div>
<p>
where <span class="math">\(g\)</span> is some portfolio which we would like our portfolio to have
no correlation to. Here the elements of the vector <span class="math">\(\mu\)</span> are <span class="math">\(\pi_j m_j\)</span>,
and the covariance <span class="math">\(\Sigma\)</span> is <span class="math">\(\operatorname{diag}\left(d\right) - \mu\mu^{\top},\)</span>
where <span class="math">\(d_j = \pi_j \left(s_j^2 + m_j^2\right).\)</span>
To test whether market timing beats buy-and-hold, take <span class="math">\(g\)</span> to be the
vector of all ones, and then test the signal-noise ratio of the resultant
portfolio.
(<em>n.b.</em> This test is agnostic as to whether buy-and-hold long is better than buy-and-hold short!)
That test is actually a "spanning test", and can be performed
by using the delta method, as I outlined in section 4.2 of my paper
on the <a href="https://arxiv.org/abs/1312.0557">distribution of the Markowitz portfolio</a>.</p>
<h2>Conditional Markowitz</h2>
<p>In the conditional Markowitz procedure we force the <span class="math">\(s_j^2\)</span> to be equal, while
allowing the <span class="math">\(m_j\)</span> to vary. To test this we construct a <span class="math">\(J\)</span>-vector <span class="math">\(f_i\)</span> of the indicator
functions, then perform a linear regression of <span class="math">\(f_i\)</span> against <span class="math">\(x_i\)</span>.
Pooling the residuals of this in-sample fit, we then compute the estimate of
<span class="math">\(s_{\cdot}^2\)</span>. Note that the conditional Markowitz portfolio now has
<span class="math">\(w_j\)</span> simply proportional to (our estimate of) <span class="math">\(m_j\)</span>, since the variance
is fixed.</p>
<p>To test for presence of an effect one uses an MGLH test, like the Hotelling-Lawley trace.
Now, however,
the test for market timing ability beyond buy-and-hold is not via a spanning test.
The spanning test outlined in section 4.5 of
<a href="https://arxiv.org/abs/1312.0557">the asymptotic Markowitz paper</a>
only tests against other static portfolios on the assets, but in this case
there is only a single asset, the market;
To have zero correlation to the buy-and-hold portfolio one would have
to hold zero dollars of the market.
To test ability beyond buy-and-hold, one should use a regression test
for equality of the regression betas, in this case equivalent to testing
equality of all the <span class="math">\(m_j\)</span>. That is, an 'ANOVA'.</p>
<h2>Lets try it</h2>
<p>Here I demonstrate the idea with some toy data.
The 'market' in this case are the monthly simple returns of the Market
portfolio, taken from the Fama French data.
I have added the risk-free rate back to the market returns as they
were published by <a href="https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">Ken French</a>,
since we may hold the Market long or short.</p>
<p>For features I compute the 6 month rolling mean return, and the 12 month volatility
of Market returns.
The mean computation is a bit odd, since these are simple returns, not
geometric returns, and so they do not telescope.
I lag both of these computations by two months, then align them with Market.
The two month lag is equivalent to lagging the feature one month minus an epsilon,
and is 'causal' in the sense that one could observe the features prior to making
a trade decision.</p>
<p>I then binarize both these variables, comparing the mean to <span class="math">\(1\%\)</span> per month to define the market as 'bear or 'bull',
and comparing the volatility to <span class="math">\(4\%\)</span> per square root month to define the environment as 'high vol'
or 'low vol'. The product of these two give us a feature with four states.
The odd cutoffs were chosen to give approximately equal <span class="math">\(\pi_j\)</span>.
Here I load the data and compute the feature. </p>
<div class="highlight"><pre><span></span><span class="c1"># devtools::install_github('shabbychef/aqfb_data')</span>
<span class="kn">library</span><span class="p">(</span>aqfb.data<span class="p">)</span>
data<span class="p">(</span>mff4<span class="p">)</span>
<span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>fromo<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tidyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>magrittr<span class="p">)</span>
<span class="p">})</span>
df <span class="o"><-</span> <span class="kt">data.frame</span><span class="p">(</span>mkt<span class="o">=</span>mff4<span class="o">$</span>Mkt<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>mean06<span class="o">=</span><span class="kp">as.numeric</span><span class="p">(</span>fromo<span class="o">::</span>running_mean<span class="p">(</span>Mkt<span class="p">,</span><span class="m">6</span><span class="p">,</span>min_df<span class="o">=</span><span class="m">6L</span><span class="p">)),</span>
vol12<span class="o">=</span><span class="kp">as.numeric</span><span class="p">(</span>fromo<span class="o">::</span>running_sd<span class="p">(</span>Mkt<span class="p">,</span><span class="m">12</span><span class="p">,</span>min_df<span class="o">=</span><span class="m">12L</span><span class="p">)))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>vola<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>dplyr<span class="o">::</span>lag<span class="p">(</span>vol12<span class="p">,</span><span class="m">2</span><span class="p">)</span> <span class="o">>=</span> <span class="m">4</span><span class="p">,</span><span class="s">'hivol'</span><span class="p">,</span><span class="s">'lovol'</span><span class="p">),</span>
bear<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>dplyr<span class="o">::</span>lag<span class="p">(</span>mean06<span class="p">,</span><span class="m">2</span><span class="p">)</span> <span class="o">>=</span> <span class="m">1</span><span class="p">,</span><span class="s">'bull'</span><span class="p">,</span><span class="s">'bear'</span><span class="p">))</span> <span class="o">%>%</span>
dplyr<span class="o">::</span>filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>vola<span class="p">),</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>bear<span class="p">))</span> <span class="o">%>%</span>
tidyr<span class="o">::</span>unite<span class="p">(</span>feature<span class="p">,</span>bear<span class="p">,</span>vola<span class="p">,</span>remove<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
</pre></div>
<p>Here are plots of the distribution of Market returns in each of the four states
of the feature. On the top are the high volatility states; bear and bull are denoted
by different colors. The violin plots show the distribution, while jittered points
give some indication of the location of outliers.</p>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
ph <span class="o"><-</span> df <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>x<span class="o">=</span>feature<span class="p">,</span>y<span class="o">=</span>Mkt<span class="p">,</span>color<span class="o">=</span>bear<span class="p">))</span> <span class="o">+</span>
geom_violin<span class="p">()</span> <span class="o">+</span> geom_jitter<span class="p">(</span>alpha<span class="o">=</span><span class="m">0.4</span><span class="p">,</span>width<span class="o">=</span><span class="m">0.3</span><span class="p">,</span>height<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept<span class="o">=</span><span class="m">0</span><span class="p">,</span>linetype<span class="o">=</span><span class="m">2</span><span class="p">,</span>alpha<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span>
coord_flip<span class="p">()</span> <span class="o">+</span>
facet_grid<span class="p">(</span>vola<span class="o">~</span><span class="m">.</span><span class="p">,</span>space<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>y<span class="o">=</span><span class="s">'monthly market returns (pct)'</span><span class="p">,</span>
x<span class="o">=</span><span class="s">'feature'</span><span class="p">,</span>
color<span class="o">=</span><span class="s">'bear/bull market?'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'market returns for different states'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/market_timing_violins-1.png" title="plot of chunk violins" alt="plot of chunk violins" width="1000px" height="500px" /></p>
<p>We clearly see higher volatility in the <code>hivol</code> case, but it is hard to
get a sense of how the mean differs in the four cases.
Here I tabulate the mean and standard deviation of returns
for each of the four states, and then compute the quasi Markowitz portfolio
defined as <span class="math">\(m_j/ \left(s_j^2 + m_j^2\right)\)</span>.
There is some momentum effect with higher Markowitz weights in bull
markets, and a low-vol effect due to autocorrelated heteroskedasticity.</p>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>knitr<span class="p">)</span>
df <span class="o">%>%</span>
group_by<span class="p">(</span>feature<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>muv<span class="o">=</span><span class="kp">mean</span><span class="p">(</span>Mkt<span class="p">),</span>sdv<span class="o">=</span>sd<span class="p">(</span>Mkt<span class="p">),</span>count<span class="o">=</span>n<span class="p">())</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
mutate<span class="p">(</span><span class="sb">`quasi markowitz`</span><span class="o">=</span>muv <span class="o">/</span> <span class="p">(</span>sdv<span class="o">^</span><span class="m">2</span> <span class="o">+</span> muv<span class="o">^</span><span class="m">2</span><span class="p">))</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`mean ret`</span><span class="o">=</span>muv<span class="p">,</span><span class="sb">`sd ret`</span><span class="o">=</span>sdv<span class="p">)</span> <span class="o">%>%</span>
knitr<span class="o">::</span>kable<span class="p">()</span>
</pre></div>
<table>
<thead>
<tr>
<th align="left">feature</th>
<th align="right">mean ret</th>
<th align="right">sd ret</th>
<th align="right">count</th>
<th align="right">quasi markowitz</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">bear_hivol</td>
<td align="right">0.772454</td>
<td align="right">7.90546</td>
<td align="right">273</td>
<td align="right">0.012243</td>
</tr>
<tr>
<td align="left">bear_lovol</td>
<td align="right">0.681096</td>
<td align="right">4.10370</td>
<td align="right">228</td>
<td align="right">0.039360</td>
</tr>
<tr>
<td align="left">bull_hivol</td>
<td align="right">1.114962</td>
<td align="right">5.03885</td>
<td align="right">262</td>
<td align="right">0.041864</td>
</tr>
<tr>
<td align="left">bull_lovol</td>
<td align="right">1.006189</td>
<td align="right">3.38457</td>
<td align="right">328</td>
<td align="right">0.080704</td>
</tr>
</tbody>
</table>
<p>Now let's perform inference.
Luckily I already coded all of the tests we will need here in <code>SharpeR</code>.
For flattening, take the product of the Market returns and the dummy
0/1 variables for the feature.
I then feed them to <code>as.sropt</code>, which computes and displays:
the Sharpe ratio of the Markowitz portfolio;
the "Sharpe Ratio Information Criterion" of
<a href="https://arxiv.org/abs/1602.06186">Paulsen and Sohl</a>, which
is unbiased for the out-of-sample performance;
the 95 percent confidence bounds on the optimal Signal-Noise ratio;
the Hotelling <span class="math">\(T^2\)</span> and associated <span class="math">\(p\)</span>-value.</p>
<div class="highlight"><pre><span></span><span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>SharpeR<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>fastDummies<span class="p">)</span>
<span class="p">})</span>
mktsr <span class="o"><-</span> as.sr<span class="p">(</span>df<span class="o">$</span>Mkt<span class="p">,</span>ope<span class="o">=</span><span class="m">12</span><span class="p">)</span>
Y <span class="o"><-</span> df <span class="o">%>%</span>
dummy_columns<span class="p">(</span>select_columns<span class="o">=</span><span class="s">'feature'</span><span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>y_bull_hivol<span class="o">=</span>Mkt <span class="o">*</span> feature_bull_hivol<span class="p">,</span>
y_bear_hivol<span class="o">=</span>Mkt <span class="o">*</span> feature_bear_hivol<span class="p">,</span>
y_bull_lovol<span class="o">=</span>Mkt <span class="o">*</span> feature_bull_lovol<span class="p">,</span>
y_bear_lovol<span class="o">=</span>Mkt <span class="o">*</span> feature_bear_lovol<span class="p">)</span> <span class="o">%>%</span>
select<span class="p">(</span>matches<span class="p">(</span><span class="s">'^y_(bull|bear)_(hi|lo)vol$'</span><span class="p">))</span>
sstar <span class="o"><-</span> as.sropt<span class="p">(</span>Y<span class="p">,</span>ope<span class="o">=</span><span class="m">12</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>sstar<span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span> SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)
Sharpe 0.740 0.696 0.504 0.927 49.9 6.9e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
</pre></div>
<p>We compute the Sharpe ratio of the sample Markowitz
portfolio to be <span class="math">\(0.740471 \mbox{yr}^{-1/2}\)</span>.
Compare this to the Sharpe ratio of the long market
portfolio, which we compute to be around
<span class="math">\(0.586302 \mbox{yr}^{-1/2}\)</span>.
We now perform the spanning test.
This is via the <code>as.del_sropt</code> function,
where we feed in portfolios to hedge against.
We display the in-sample Sharpe statistic,
confidence intervals on the population quantity,
and the <span class="math">\(F\)</span> statistic and <span class="math">\(p\)</span> value.</p>
<div class="highlight"><pre><span></span>spansr <span class="o"><-</span> as.del_sropt<span class="p">(</span>Y<span class="p">,</span>G<span class="o">=</span><span class="kt">matrix</span><span class="p">(</span><span class="kp">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">4</span><span class="p">),</span>nrow<span class="o">=</span><span class="m">1</span><span class="p">),</span>ope<span class="o">=</span><span class="m">12</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>spansr<span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span> SR/sqrt(yr) 2.5 % 97.5 % F value Pr(>F)
Sharpe 0.452 0.21 0.64 6.01 0.00046 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
</pre></div>
<p>We estimate the Sharpe of the hedged portfolio to be
<span class="math">\(0.452268 \mbox{yr}^{-1/2}\)</span>.
It is worth pointing out the subadditivity of SNR here.
If you have two uncorrelated assets, the
Signal-Noise ratio the optimal portfolio on the
assets is the root square sum of the SNRs
of the assets. Generalizing to <span class="math">\(k\)</span> independent assets,
the optimal SNR is the length of the vector
whose elements are the SNRs of the assets.
In this case we observe
</p>
<div class="math">$$
\sqrt{0.452268^2 + 0.586302^2} \approx 0.740471,
$$</div>
<p>
which was the Sharpe of the unhedged timing portfolio.
The gains beyond buy-and-hold seem modest indeed;
one would require very patient investors to prove
out this strategy in real trading.</p>
<p><code>SharpeR</code> does not compute the portfolio weights.
So here I use <code>MarkowitzR</code> to compute and
display the weights of the unhedged Markowitz portfolio
and the Markowitz portfolio hedged against buy-and-hold.
The first should have weights proportional to the
quasi Markowitz weights shown above.</p>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>MarkowitzR<span class="p">)</span>
bare <span class="o"><-</span> mp_vcov<span class="p">(</span>Y<span class="p">)</span>
kable<span class="p">(</span>bare<span class="o">$</span>W<span class="p">,</span>caption<span class="o">=</span><span class="s">'unhedged portfolio'</span><span class="p">)</span>
</pre></div>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Intercept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">y_bull_hivol</td>
<td align="right">0.043938</td>
</tr>
<tr>
<td align="left">y_bear_hivol</td>
<td align="right">0.012850</td>
</tr>
<tr>
<td align="left">y_bull_lovol</td>
<td align="right">0.084632</td>
</tr>
<tr>
<td align="left">y_bear_lovol</td>
<td align="right">0.041337</td>
</tr>
</tbody>
</table>
<div class="highlight"><pre><span></span>vsbh <span class="o"><-</span> mp_vcov<span class="p">(</span>Y<span class="p">,</span>Gmat<span class="o">=</span><span class="kt">matrix</span><span class="p">(</span><span class="kp">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">4</span><span class="p">),</span>nrow<span class="o">=</span><span class="m">1</span><span class="p">))</span>
kable<span class="p">(</span>vsbh<span class="o">$</span>W<span class="p">,</span>caption<span class="o">=</span><span class="s">'hedged portfolio'</span><span class="p">)</span>
</pre></div>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Intercept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">y_bull_hivol</td>
<td align="right">0.012287</td>
</tr>
<tr>
<td align="left">y_bear_hivol</td>
<td align="right">-0.018801</td>
</tr>
<tr>
<td align="left">y_bull_lovol</td>
<td align="right">0.052981</td>
</tr>
<tr>
<td align="left">y_bear_lovol</td>
<td align="right">0.009686</td>
</tr>
</tbody>
</table>
<p>The hedged portfolio has negative or near-zero
weights in bear markets, and generally smaller holdings
in high volatility environments, as expected.
Note that we have achieved zero correlation to buy-and-hold
without apparently having zero mean weights.
In reality our portfolio weights have zero
volatility-weighted mean.</p>
<p>All of this analysis was via the "flattening trick".
I realize I do not have good tools in place to perform
the spanning test in the conditional Markowitz
formulation.
Of course, R has tools for the ANOVA test, but
they will not report the effect size in units
like the Sharpe, so it is hard to interpret
economic significance.
However, I can easily compute the conditional Markowitz
portfolio weights, which I tabulate below.
Note that the assumption of equal volatility makes
the portfolio weights proportional to the estimated
mean returns</p>
<div class="highlight"><pre><span></span><span class="c1"># conditional markowitz. </span>
featfit <span class="o"><-</span> mp_vcov<span class="p">(</span>X<span class="o">=</span><span class="kp">as.matrix</span><span class="p">(</span>df<span class="o">$</span>Mkt<span class="p">),</span>
feat<span class="o">=</span>df <span class="o">%>%</span>
dummy_columns<span class="p">(</span>select_columns<span class="o">=</span><span class="s">'feature'</span><span class="p">)</span> <span class="o">%>%</span>
select<span class="p">(</span>matches<span class="p">(</span><span class="s">'^feature_(bull|bear)_(hi|lo)vol$'</span><span class="p">))</span> <span class="o">%>%</span>
<span class="kp">as.matrix</span><span class="p">(),</span>
fit.intercept<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
kable<span class="p">(</span><span class="kp">t</span><span class="p">(</span>featfit<span class="o">$</span>W<span class="p">),</span>caption<span class="o">=</span><span class="s">'conditional Markowitz unhedged portfolio'</span><span class="p">)</span>
</pre></div>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">as.matrix(df$Mkt)1</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">feature_bull_lovol</td>
<td align="right">0.035191</td>
</tr>
<tr>
<td align="left">feature_bull_hivol</td>
<td align="right">0.038995</td>
</tr>
<tr>
<td align="left">feature_bear_hivol</td>
<td align="right">0.027016</td>
</tr>
<tr>
<td align="left">feature_bear_lovol</td>
<td align="right">0.023821</td>
</tr>
</tbody>
</table>
<h3>Caveats</h3>
<p>I feel it is worthwhile to point out this is a toy analysis:
the data go back to the late 1920's, which was a far different trading environment;
we ignore any trading frictions and assume you can freely short or lever the Market;
the feature is highly autocorrelated so investors are unlikely to see the long-term
benefit of this timing portfolio, <em>etc.</em>
In any case, don't take investing advice from a blogpost.</p>
<h2>Further work</h2>
<p>There is a non-parametric analogue of the flattening trick used here that
applies to the case of market timing with a single continous feature,
which I hope to present in a future blog post.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Conditional Portfolios with Feature Flattening2019-06-19T21:04:21-07:002019-06-19T21:04:21-07:00Steven E. Pavtag:www.gilgamath.com,2019-06-19:/portfolio-flattening.html<h2>Conditional Portfolios</h2>
<p>When I first started working at a quant fund I tried to read
about portfolio theory. (Beyond, you know, "<em>Hedge Funds for Dummies</em>.")
I learned about various objectives and portfolio constraints,
including the Markowitz portfolio, which felt very natural.
Markowitz solves the mean-variance optimization problem, as
well as the Sharpe maximization problem, namely
</p>
<div class="math">$$
\operatorname{argmax}_w \frac{w^{\top}\mu}{\sqrt{w^{\top} \Sigma w}}.
$$</div>
<p>
This is solved, up to scaling, by the Markowitz portfolio <span class="math">\(\Sigma^{-1}\mu\)</span>.</p>
<p>When I first read about the theory behind Markowitz, I
did not read anything about where <span class="math">\(\mu\)</span> and <span class="math">\(\Sigma\)</span> come from.
I assumed the authors I was reading were talking about the
vanilla sample estimates of the mean and covariance,
though the theory does not require this.</p>
<p>There are some problems with the Markowitz portfolio.
For us, as a small quant fund, the most pressing issue
was that holding the Markowitz portfolio based on the
historical mean and covariance was not a good look.
You don't get paid "2 and twenty" for computing some
long term averages.</p>
<!-- PELICAN_END_SUMMARY -->
<p>Rather than holding an <em>unconditional</em> portfolio,
we sought to construct a <em>conditional</em> one,
conditional on some "features".
(I now believe this topic falls under the rubric of "Tactical Asset
Allocation".)
We stumbled on two simple methods for adapting
Markowitz theory to accept conditioning information:
Conditional Markowitz, and "Flattening".</p>
<h2>Conditional Markowitz</h2>
<p>Suppose you observe some <span class="math">\(l\)</span> vector of features, <span class="math">\(f_i\)</span> prior
to the time you have to allocate into <span class="math">\(p\)</span> assets to enjoy
returns <span class="math">\(x_i\)</span>. Assume that the returns are linear in the features,
but the covariance is a long term average. That is
</p>
<div class="math">$$
E\left[x_i \left|f_i\right.\right] = B f_i,\quad\mbox{Var}\left(x_i \left|f_i\right.\right) = \Sigma.
$$</div>
<p>Note that Markowitz theory never really said how to estimate
mean …</p><h2>Conditional Portfolios</h2>
<p>When I first started working at a quant fund I tried to read
about portfolio theory. (Beyond, you know, "<em>Hedge Funds for Dummies</em>.")
I learned about various objectives and portfolio constraints,
including the Markowitz portfolio, which felt very natural.
Markowitz solves the mean-variance optimization problem, as
well as the Sharpe maximization problem, namely
</p>
<div class="math">$$
\operatorname{argmax}_w \frac{w^{\top}\mu}{\sqrt{w^{\top} \Sigma w}}.
$$</div>
<p>
This is solved, up to scaling, by the Markowitz portfolio <span class="math">\(\Sigma^{-1}\mu\)</span>.</p>
<p>When I first read about the theory behind Markowitz, I
did not read anything about where <span class="math">\(\mu\)</span> and <span class="math">\(\Sigma\)</span> come from.
I assumed the authors I was reading were talking about the
vanilla sample estimates of the mean and covariance,
though the theory does not require this.</p>
<p>There are some problems with the Markowitz portfolio.
For us, as a small quant fund, the most pressing issue
was that holding the Markowitz portfolio based on the
historical mean and covariance was not a good look.
You don't get paid "2 and twenty" for computing some
long term averages.</p>
<!-- PELICAN_END_SUMMARY -->
<p>Rather than holding an <em>unconditional</em> portfolio,
we sought to construct a <em>conditional</em> one,
conditional on some "features".
(I now believe this topic falls under the rubric of "Tactical Asset
Allocation".)
We stumbled on two simple methods for adapting
Markowitz theory to accept conditioning information:
Conditional Markowitz, and "Flattening".</p>
<h2>Conditional Markowitz</h2>
<p>Suppose you observe some <span class="math">\(l\)</span> vector of features, <span class="math">\(f_i\)</span> prior
to the time you have to allocate into <span class="math">\(p\)</span> assets to enjoy
returns <span class="math">\(x_i\)</span>. Assume that the returns are linear in the features,
but the covariance is a long term average. That is
</p>
<div class="math">$$
E\left[x_i \left|f_i\right.\right] = B f_i,\quad\mbox{Var}\left(x_i \left|f_i\right.\right) = \Sigma.
$$</div>
<p>Note that Markowitz theory never really said how to estimate
mean returns, and thus the conditional expectation here can be used directly
in the Markowitz portfolio definition.
Thus the conditional Markowitz portfolio, conditional
on observing <span class="math">\(f_i\)</span> is simply <span class="math">\(\Sigma^{-1} B f_i\)</span>. Another way of viewing this
is to estimate the "Markowitz coefficient", <span class="math">\(W=\Sigma^{-1} B\)</span> and just multiply
this by <span class="math">\(f_i\)</span> when it is observed.</p>
<p>I have written about inference on the <a href="https://arxiv.org/abs/1312.0557">conditional Markowitz</a>
portfolio: via the MGLH tests one can test essentially whether <span class="math">\(W\)</span> is all
zeros, or test the total effect size. However, the conditional Markowitz
procedure is, like the unconditional procedure, subject to the
<a href="https://arxiv.org/abs/1409.5936">Cramer Rao portfolio bounds</a> in the
'obvious' way: increasing the number of fit coefficients faster
than the signal-noise ratio can cause degraded out-of-sample
performance.</p>
<h2>The Flattening Trick</h2>
<p>The other approach for adding conditional information is slicker.
When I first reinvented it, I called it the "flattening trick".
I assumed it was well established in the folklore of the quant community,
but I have only found one reference to it, a
<a href="https://www.researchgate.net/publication/5184999_Dynamic_Portfolio_Selection_by_Augmenting_the_Asset_Space">paper by Brandt and Santa Clara</a>,
where they refer to it as "augmenting the asset space". </p>
<p>The idea is as follows: in the conditional Markowitz procedure
we ended with a matrix <span class="math">\(W\)</span> such that, conditional on <span class="math">\(f_i\)</span> we would
hold portfolio <span class="math">\(W f_i\)</span>. Why not just start with the assumption that
you seek a portfolio that is linear in <span class="math">\(f_i\)</span> and optimize the <span class="math">\(W\)</span>?
Note that the returns you experience by holding <span class="math">\(W f_i\)</span> is exactly
</p>
<div class="math">$$
x_i^{\top} W f_i = \operatorname{trace}\left(x_i^{\top} W f_i\right) = \operatorname{trace}\left(f_i x_i^{\top} W\right) =
\operatorname{vec}^{\top}\left(f_i x_i^{\top}\right) \operatorname{vec}\left(W\right),
$$</div>
<p>
where <span class="math">\(\operatorname{vec}\)</span> is the vectorization operator that take a matrix
to a vector columnwise. I called this "flattening," but maybe it's more like
"unravelling".</p>
<p>Now note that the optimization problem you are trying to solve is
to find the vector <span class="math">\(\operatorname{vec}\left(W\right)\)</span>, with pseudo-returns of
<span class="math">\(y_i = \operatorname{vec}\left(f_i x_i^{\top}\right)\)</span>.<br>
You can simply construct these pseudo returns <span class="math">\(y_i\)</span>
from your historical data, and feed them into an unconditional portfolio process.
You can use unconditional Markowitz for this, or any other unconditional procedure.
Then take the results of the unconditional process and unflatten them back to <span class="math">\(W\)</span>.</p>
<p>Note that even when you use unconditional Markowitz on the flattened problem,
you will not regain the <span class="math">\(W\)</span> from conditional Markowitz. The reason is that
we are essentially allowing the covariance of returns to vary with our features
as well, which was not possible in conditional Markowitz.
In practice we often found that flattening trick had slightly worse
out-of-sample performance than conditional Markowitz when used on the same
data, which we broadly attributed to overfitting.
In conditional Markowitz we would estimate the <span class="math">\(p \times l\)</span> matrix <span class="math">\(B\)</span> and the
<span class="math">\(p \times p\)</span> matrix <span class="math">\(\Sigma\)</span>, to arrive at <span class="math">\(p \times l\)</span> matrix <span class="math">\(W\)</span>.
In flattening plus unconditional Markowitz you estimate a <span class="math">\(pl\)</span> vector of
means, and the <span class="math">\(pl \times pl\)</span> matrix of covariance to arrive at the <span class="math">\(p \times l\)</span> matrix <span class="math">\(W\)</span>.</p>
<p>To mitigate the overfitting, it is fairly easy to add sparsity to the
flattening trick. If you wish to force an element of <span class="math">\(W\)</span> to be zero,
because you think a certain feature should have no bearing on your holdings of
a certain asset, you can just elide it from the flattening pseudo returns.
Moreover, if you feel that certain feature should only have, say, a positive
influence on your holdings of a particular asset, you can directly impose
that positivity constraint in the pseudo portfolio optimization problem.
Because you are solving directly for elements of <span class="math">\(W\)</span>, this is much easier
than in conditional Markowitz where <span class="math">\(W\)</span> is the product of two matrices.</p>
<p>Flattening is a neat trick. You should consider it the next time you're
allocating assets tactically.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>No Parity like a Risk Parity.2019-06-09T22:53:04-07:002019-06-09T22:53:04-07:00Steven E. Pavtag:www.gilgamath.com,2019-06-09:/risk-parity.html<h2>Portfolio Selection and Exchangeability</h2>
<p>Consider the problem of <em>portfolio selection</em>, where you observe
some historical data on <span class="math">\(p\)</span> assets, say <span class="math">\(n\)</span> days worth in an <span class="math">\(n\times p\)</span>
matrix, <span class="math">\(X\)</span>, and then are required to construct a (dollarwise)
portfolio <span class="math">\(w\)</span>.
You can view this task as a function <span class="math">\(w\left(X\right)\)</span>.
There are a few different kinds of <span class="math">\(w\)</span> function: Markowitz,
equal dollar, Minimum Variance, Equal Risk Contribution ('Risk Parity'),
and so on.</p>
<p>How are we to choose among these competing approaches?
Their supporters can point to theoretical underpinnings,
but these often seem a bit shaky even from a distance.
Usually evidence is provided in the form of backtests
on the historical returns of some universe of assets.
It can be hard to generalize from a single history,
and these backtests rarely offer theoretical justification
for the differential performance in methods.</p>
<!-- PELICAN_END_SUMMARY -->
<p>One way to consider these different methods of portfolio
construction is via the lens of <em>exchangeability</em>.
Roughly speaking, how does the function <span class="math">\(w\left(X\right)\)</span> react
under certain systematic changes in <span class="math">\(X\)</span> that "shouldn't" matter.
For example, suppose that the ticker changed on
one stock in your universe. Suppose you order the columns of
<span class="math">\(X\)</span> alphabetically, so now you must reorder your <span class="math">\(X\)</span>.
Assuming no new data has been observed, shouldn't
<span class="math">\(w\left(X\right)\)</span> simply reorder its output in the same way?</p>
<p>Put another way, suppose a method <span class="math">\(w\)</span> systematically
overweights the first element of the universe
(This seems more like a bug than a feature),
and you observe backtests over the 2000's on
U.S. equities where <code>AAPL</code> happened to be the
first stock in the universe. Your <span class="math">\(w\)</span> might
seem to outperform other methods for no good reason.</p>
<p>Equivariance to order is a kind of exchangeability condition.
The 'right' kind of <span class="math">\(w\)</span> is 'order …</p><h2>Portfolio Selection and Exchangeability</h2>
<p>Consider the problem of <em>portfolio selection</em>, where you observe
some historical data on <span class="math">\(p\)</span> assets, say <span class="math">\(n\)</span> days worth in an <span class="math">\(n\times p\)</span>
matrix, <span class="math">\(X\)</span>, and then are required to construct a (dollarwise)
portfolio <span class="math">\(w\)</span>.
You can view this task as a function <span class="math">\(w\left(X\right)\)</span>.
There are a few different kinds of <span class="math">\(w\)</span> function: Markowitz,
equal dollar, Minimum Variance, Equal Risk Contribution ('Risk Parity'),
and so on.</p>
<p>How are we to choose among these competing approaches?
Their supporters can point to theoretical underpinnings,
but these often seem a bit shaky even from a distance.
Usually evidence is provided in the form of backtests
on the historical returns of some universe of assets.
It can be hard to generalize from a single history,
and these backtests rarely offer theoretical justification
for the differential performance in methods.</p>
<!-- PELICAN_END_SUMMARY -->
<p>One way to consider these different methods of portfolio
construction is via the lens of <em>exchangeability</em>.
Roughly speaking, how does the function <span class="math">\(w\left(X\right)\)</span> react
under certain systematic changes in <span class="math">\(X\)</span> that "shouldn't" matter.
For example, suppose that the ticker changed on
one stock in your universe. Suppose you order the columns of
<span class="math">\(X\)</span> alphabetically, so now you must reorder your <span class="math">\(X\)</span>.
Assuming no new data has been observed, shouldn't
<span class="math">\(w\left(X\right)\)</span> simply reorder its output in the same way?</p>
<p>Put another way, suppose a method <span class="math">\(w\)</span> systematically
overweights the first element of the universe
(This seems more like a bug than a feature),
and you observe backtests over the 2000's on
U.S. equities where <code>AAPL</code> happened to be the
first stock in the universe. Your <span class="math">\(w\)</span> might
seem to outperform other methods for no good reason.</p>
<p>Equivariance to order is a kind of exchangeability condition.
The 'right' kind of <span class="math">\(w\)</span> is 'order exchangeable'.
Other examples come from considering rotations or basketization.
Suppose that today your universe consists of stocks A and B,
which you can hold long or short,
but tomorrow you can only buy basket C which is equal dollars long in A and B,
and basket D which is equal dollars long A and short B.
Tomorrow you can achieve the same holdings that you wanted today,
but by holding the baskets.
Your portfolio function should be exchangeable with respect to
this transformation, suggesting you hold the same equivalent position.</p>
<p>In math, let <span class="math">\(Q\)</span> be an invertible <span class="math">\(p\times p\)</span> matrix. We will consider
what should happen if returns are transformed by <span class="math">\(Q^{\top}\)</span>. Exchangeability
holds when
</p>
<div class="math">$$
w\left(X Q\right) = Q^{-1}w\left(X\right).
$$</div>
<p>
If this holds for all invertible <span class="math">\(Q\)</span> then the <span class="math">\(w\)</span> satisfies the
exchangeability condition.
Some <span class="math">\(w\)</span> might maintain the above relationship for some kinds
of <span class="math">\(Q\)</span>, leading to weaker forms of exchangeability.
Here we name them with the class of <span class="math">\(Q\)</span>:</p>
<ul>
<li>A <span class="math">\(w\)</span> satisfies 'order exchangeability' property if it exchangeable
for all permutation matrices <span class="math">\(Q\)</span>;</li>
<li>'leverage exchangeability' if it is exchangeable
for all diagonal <span class="math">\(Q\)</span>;</li>
<li>'rotational exchangeability' if it is exchangeable
for all orthogonal <span class="math">\(Q\)</span>.</li>
</ul>
<p>Leverage exchangeability is illustrated by considering what would happen
if each asset was replaced by, say, a 2x or 3x levered version of the same asset.</p>
<p>One consequence of exchangeability is that "only returns matter".
That is if we exchange returns <span class="math">\(x\mapsto Q^{\top}x\)</span>, and portfolio
<span class="math">\(w\mapsto Q^{-1}w\)</span>, then the returns achieved by that portfolio map
to <span class="math">\(x^{\top}w \mapsto x^{\top}Q Q^{-1}w = x^{\top}w\)</span>.
The returns you achieve are the same under the transformation.
This dependence of <span class="math">\(w\)</span> on returns was a key assumption in my
<a href="https://arxiv.org/abs/1409.5936">work on portfolio quality bounds</a>.</p>
<p>It should be recognized that "only returns matter" is questionable
in practical portfolio construction, since the real world often
imposes constraints (long only, max concentration), exhibits
different costs and frictions for different assets, and
contains other oddities like tax implications, and so on.</p>
<p>Constraints in particular complicate the general definition of
exchangeability because in the transformation by <span class="math">\(Q\)</span> the
original constraints should also be translated.
In some cases, say where the constraint is an upper bound on risk,
the constraint definition is identical under the transformation.
However, the image of a long-only constraint under a general
linear transformation by <span class="math">\(Q\)</span> will not in general still be a
long-only constraint.
In a long-only world, we can perhaps only expect
order- or (positive) leverage exchangeability,
and not the general form.</p>
<p>Setting aside the issues with constraints, it is still useful,
I think, to consider the <em>objectives</em>
of portfolio construction techniques with respect to
exchangeability, inasmuch as they have them.</p>
<p>For example, the "one over N" (or "equal dollar", "Talmudic", <em>etc.</em>) rule
clearly does not satisfy general exchangeability, nor leverage exchangeability.
The Equal Risk Contribution portfolio, which we will describe below,
also fails exchangeability.
The Markowitz Portfolio, however, does satisfy exchangeability:
</p>
<div class="math">$$
\Sigma^{-1}\mu \mapsto Q^{-1}\Sigma^{-1}Q^{-\top}Q^{\top}\mu = Q^{-1}\Sigma^{-1}\mu,
$$</div>
<p>
as needed.</p>
<p>In fact, "equal dollar" seems not so much an objective as a constraint
of the portfolio allocation. There is no objective beyond perhaps
"make it seem like we are doing something with client money."
The same complaint will apply to ERC.
In fact, you <em>can</em> express Markowitz, Mean Variance, ERC
(and I believe equal dollar) as similar
<a href="https://www.grahamcapital.com/Equal%20Risk%20Contribution%20April%202019.pdf">optimization problems with risk constraints</a>.
However, the objectives do look a lot like make-work.</p>
<h2>Equal Risk Contribution</h2>
<p>The set-up for Equal Risk Contribution portfolio, or Risk Parity, is as follows:
define the risk of a portfolio <span class="math">\(w\)</span> as the standard deviation of returns,
<span class="math">\(r = \sqrt{w^{\top}\Sigma w}\)</span>. This function is homogeneous of order 1 meaning
that if you positively rescale your whole portfolio by <span class="math">\(k\)</span>, the risk scales
by <span class="math">\(k\)</span>. That is if you map <span class="math">\(w \mapsto k w\)</span> then
<span class="math">\(\sqrt{w^{\top}\Sigma w} \mapsto k \sqrt{w^{\top}\Sigma w}\)</span> for positive <span class="math">\(k\)</span>.</p>
<p>Using Euler's Homogeneous function theorem, we can express the risk as
</p>
<div class="math">$$r = w^{\top} \nabla_{w}r = w^{\top} \frac{\Sigma w}{\sqrt{w^{\top}\Sigma w}}.$$</div>
<p>
The theory behind Risk Parity then says because of this equation,
the vector <span class="math">\(w \odot \frac{\Sigma w}{\sqrt{w^{\top}\Sigma w}}\)</span> is the
"risk in each asset," where <span class="math">\(\odot\)</span> is the Hadamard (elementwise) multiplication.
This is very tempting because the sum of the elements of this
vector is exactly <span class="math">\(r\)</span> by Euler's Theorem.
The Equal Risk Portfolio is the one such that each element of
<span class="math">\(w \odot \frac{\Sigma w}{\sqrt{w^{\top}\Sigma w}}\)</span> is the same.
It has "equal risk in each asset".</p>
<p>However, I can see no principled reason to view this vector
as the risk in each asset.
By definition it happens to be the marginal contribution
to risk from each asset due to a proportional change in
holdings. That is, it is equal to <span class="math">\(\nabla_{\log(w)}r\)</span>,
and expresses how risk would change under a small
proportional change in weight in your portfolio.
However, it is clearly not the risk in each asset because
it can contain negative elements!
If you hold an asset that diversifies (<em>i.e.</em> has
negative correlation with) existing holdings, then
increasing your contribution can decrease risk.
The fact that the elements of this vector sum to
the total risk is also not convincing:
one could just as easily say that each asset has
<span class="math">\(r / p\)</span> risk in it, and capture the same property.</p>
<p>As mentioned above, the risk contribution vector does not
satisfy an exchangeability condition.
Taking <span class="math">\(x\mapsto Q^{\top}x\)</span> and assuming exchangeability,
<span class="math">\(w\mapsto Q^{-1}w\)</span>, then <span class="math">\(r \mapsto r\)</span> and
</p>
<div class="math">$$
w \odot \frac{\Sigma w}{r} \mapsto
Q^{-1} w \odot \frac{Q^{\top}\Sigma w}{r}.
$$</div>
<p>
That is, if <span class="math">\(w\)</span> was the ERC portfolio, then <span class="math">\(Q^{-1}w\)</span> is not the
ERC in transformed space.</p>
<p>You can confirm this in code, which I have
lifted from the <code>riskParityPortfolio</code>
<a href="https://cran.r-project.org/web/packages/riskParityPortfolio/vignettes/RiskParityPortfolio.html">vignette</a>.
The ERC is not exchangeable for general <span class="math">\(Q\)</span> or orthogonal <span class="math">\(Q\)</span>,
but is for diagonal <span class="math">\(Q\)</span>. We check them here:</p>
<div class="highlight"><pre><span></span><span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>riskParityPortfolio<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>mvtnorm<span class="p">)</span>
<span class="p">})</span>
risk <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>w<span class="p">,</span>Sigma<span class="p">)</span> <span class="p">{</span> <span class="kp">sqrt</span><span class="p">(</span><span class="kp">as.numeric</span><span class="p">(</span>w <span class="o">%*%</span> Sigma <span class="o">%*%</span> w<span class="p">))</span> <span class="p">}</span>
riskcon <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>w<span class="p">,</span>Sigma<span class="p">)</span> <span class="p">{</span>
Sw <span class="o"><-</span> Sigma <span class="o">%*%</span> w
<span class="kp">as.numeric</span><span class="p">(</span>w <span class="o">*</span> <span class="p">(</span>Sw<span class="p">)</span> <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span><span class="kp">as.numeric</span><span class="p">(</span>w <span class="o">%*%</span> Sw<span class="p">)))</span>
<span class="p">}</span>
<span class="c1"># from the excellent vignette:</span>
<span class="c1"># generate synthetic data</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">42</span><span class="p">)</span>
N <span class="o"><-</span> <span class="m">5</span>
V <span class="o"><-</span> <span class="kt">matrix</span><span class="p">(</span>rnorm<span class="p">(</span>N<span class="o">*</span><span class="p">(</span>N<span class="m">+50</span><span class="p">)),</span> ncol <span class="o">=</span> N<span class="p">)</span>
Sigma <span class="o"><-</span> cov<span class="p">(</span>V<span class="p">)</span>
portfolio <span class="o"><-</span> riskParityPortfolio<span class="p">(</span>Sigma<span class="o">=</span>Sigma<span class="p">)</span>
<span class="c1"># print('check general exchangeability\n')</span>
Q <span class="o"><-</span> rWishart<span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">50</span><span class="p">,</span>Sigma<span class="o">=</span><span class="kp">diag</span><span class="p">(</span>N<span class="p">))</span>
<span class="kp">dim</span><span class="p">(</span>Q<span class="p">)</span> <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span>N<span class="p">,</span>N<span class="p">)</span>
knitr<span class="o">::</span>kable<span class="p">(</span>tibble<span class="p">(</span>start<span class="o">=</span>riskcon<span class="p">(</span>portfolio<span class="o">$</span>w<span class="p">,</span>Sigma<span class="p">),</span>
Q_trans<span class="o">=</span>riskcon<span class="p">(</span><span class="kp">solve</span><span class="p">(</span>Q<span class="p">,</span>portfolio<span class="o">$</span>w<span class="p">),</span><span class="kp">t</span><span class="p">(</span>Q<span class="p">)</span> <span class="o">%*%</span> Sigma <span class="o">%*%</span> Q<span class="p">)))</span>
</pre></div>
<table>
<thead>
<tr>
<th align="right">start</th>
<th align="right">Q_trans</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076276</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.078032</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.070880</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.077396</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.078098</td>
</tr>
</tbody>
</table>
<div class="highlight"><pre><span></span><span class="c1"># print('check orthogonal exchangeability\n')</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span>
B <span class="o"><-</span> rWishart<span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">50</span><span class="p">,</span>Sigma<span class="o">=</span><span class="kp">diag</span><span class="p">(</span>N<span class="p">))</span>
<span class="kp">dim</span><span class="p">(</span>B<span class="p">)</span> <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span>N<span class="p">,</span>N<span class="p">)</span>
Q <span class="o"><-</span> <span class="kp">eigen</span><span class="p">(</span>B<span class="p">)</span><span class="o">$</span>vectors
knitr<span class="o">::</span>kable<span class="p">(</span>tibble<span class="p">(</span>start<span class="o">=</span>riskcon<span class="p">(</span>portfolio<span class="o">$</span>w<span class="p">,</span>Sigma<span class="p">),</span>
Q_trans<span class="o">=</span>riskcon<span class="p">(</span><span class="kp">solve</span><span class="p">(</span>Q<span class="p">,</span>portfolio<span class="o">$</span>w<span class="p">),</span><span class="kp">t</span><span class="p">(</span>Q<span class="p">)</span> <span class="o">%*%</span> Sigma <span class="o">%*%</span> Q<span class="p">)))</span>
</pre></div>
<table>
<thead>
<tr>
<th align="right">start</th>
<th align="right">Q_trans</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">0.076136</td>
<td align="right">0.046762</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.021432</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.226237</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.082872</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.003379</td>
</tr>
</tbody>
</table>
<div class="highlight"><pre><span></span><span class="c1"># print('check leverage exchangeability\n')</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">17</span><span class="p">)</span>
Q <span class="o"><-</span> <span class="kp">diag</span><span class="p">(</span>runif<span class="p">(</span>N<span class="p">,</span>min<span class="o">=</span><span class="m">0.5</span><span class="p">,</span>max<span class="o">=</span><span class="m">2.0</span><span class="p">))</span>
knitr<span class="o">::</span>kable<span class="p">(</span>tibble<span class="p">(</span>start<span class="o">=</span>riskcon<span class="p">(</span>portfolio<span class="o">$</span>w<span class="p">,</span>Sigma<span class="p">),</span>
Q_trans<span class="o">=</span>riskcon<span class="p">(</span><span class="kp">solve</span><span class="p">(</span>Q<span class="p">,</span>portfolio<span class="o">$</span>w<span class="p">),</span><span class="kp">t</span><span class="p">(</span>Q<span class="p">)</span> <span class="o">%*%</span> Sigma <span class="o">%*%</span> Q<span class="p">)))</span>
</pre></div>
<table>
<thead>
<tr>
<th align="right">start</th>
<th align="right">Q_trans</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076136</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076136</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076136</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076136</td>
</tr>
<tr>
<td align="right">0.076136</td>
<td align="right">0.076136</td>
</tr>
</tbody>
</table>
<h2>The Symmetric Square Root</h2>
<p>One of the reasons I wanted to write this post was to draw attention
to the symmetric square root, which we typically do not
use for portfolio construction, but is useful for
risk decomposition. We can express the risk of a portfolio as
</p>
<div class="math">$$
r = \| \Sigma^{1/2} w \|_2^2,
$$</div>
<p>
where <span class="math">\(\Sigma^{1/2}\)</span> is any matrix square root of <span class="math">\(\Sigma\)</span>.
Then the elements of
<span class="math">\(\Sigma^{1/2} w\)</span> would seem to decompose the risk of your
portfolio, in a squared error sense.
That is, the elements of <span class="math">\(\Sigma^{1/2} w\)</span>, <em>when squared</em>
sum to the risk squared.
That vector may contain negative elements, but this
does not affect the square sum.
We can just square the elements of
<span class="math">\(\Sigma^{1/2} w\)</span>,
and claim we have "decomposed risk".
Whether this is a useful decomposition, or has
any real meaning, is debatable.
We can check if this is an exchangeable function.</p>
<p>If you use the Cholesky square root, this
risk decomposition does not satisfy
order exchangeability! This clearly seems like
a bad way to express risk.
If, however, you use the symmetric square
root, then the decomposition is
exchangeable with resect to reordering,
relevering, and even to rotation, but
perhaps not to general transformation by
<span class="math">\(Q\)</span>. Under a orthogonal <span class="math">\(Q\)</span> we have
<span class="math">\(\Sigma^{1/2} \mapsto Q^{\top}\Sigma^{1/2}Q\)</span> and
so if
<span class="math">\(w\mapsto Q^{-1}w\)</span>, then
<span class="math">\(\Sigma^{1/2} w \mapsto Q^{\top}\Sigma^{1/2}w\)</span>.</p>
<p>Again it is not clear this is a meaningful
decomposition of risk.
Whether it is or not, I am not aware of this
definition being used to construct an ERC
portfolio, though I suspect it is only a matter
of time.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>fromo 0.2.02019-01-13T10:23:39-08:002019-01-13T10:23:39-08:00Steven E. Pavtag:www.gilgamath.com,2019-01-13:/fromo-two.html<p>I recently pushed version 0.2.0 of my <code>fromo</code> package to
<a href="https://cran.r-project.org/package=fromo">CRAN</a>.
This package implements (relatively) fast, numerically robust
computation of moments via <code>Rcpp</code>.
<!-- PELICAN_END_SUMMARY --></p>
<p>The big changes in this release are:</p>
<ul>
<li>Support for weighted moment estimation.</li>
<li>Computation of running moments over windows defined
by time (or some other increasing index), rather
than vector index.</li>
<li>Some modest improvements in speed for the 'dangerous'
use cases (no checking for <code>NA</code>, no weights, <em>etc.</em>)</li>
</ul>
<p>The time-based running moments are supported via the <code>t_running_*</code> operations,
and we support means, standard deviation, skew, kurtosis, centered and
standardized moments and cumulants, z-score, Sharpe, and t-stat. The
idea is that your observations are associated with some increasing
index, which you can think of as the observation time, and you wish
to compute moments over a fixed time window. To bloat the API, the
times from which you 'look back' can optionally be something other
than the time indices of the input, so the input and output size
can be different.</p>
<p>Some example uses might be:</p>
<ul>
<li>Compute the volatility of an asset's returns over the previous 6 months,
on every trade day.</li>
<li>Compute the total monthly sales of a company at month ends.</li>
</ul>
<p>Because the API also allows you to use weights as implicit time deltas, you can
also do weird and unadvisable things like compute the Sharpe of an asset
over the last 1 million shares traded.</p>
<p>Speed improvements come from my random walk through c++ design idioms.
I also implemented a 'swap' procedure for the running standard deviation
which incorporates a Welford's method addition and removal into a single
step. I do not believe that Welford's method is the fastest algorithm
for a summarizing moment computation: probably a two pass solution to
compute the mean first, then the centered moments is faster. However,
for the …</p><p>I recently pushed version 0.2.0 of my <code>fromo</code> package to
<a href="https://cran.r-project.org/package=fromo">CRAN</a>.
This package implements (relatively) fast, numerically robust
computation of moments via <code>Rcpp</code>.
<!-- PELICAN_END_SUMMARY --></p>
<p>The big changes in this release are:</p>
<ul>
<li>Support for weighted moment estimation.</li>
<li>Computation of running moments over windows defined
by time (or some other increasing index), rather
than vector index.</li>
<li>Some modest improvements in speed for the 'dangerous'
use cases (no checking for <code>NA</code>, no weights, <em>etc.</em>)</li>
</ul>
<p>The time-based running moments are supported via the <code>t_running_*</code> operations,
and we support means, standard deviation, skew, kurtosis, centered and
standardized moments and cumulants, z-score, Sharpe, and t-stat. The
idea is that your observations are associated with some increasing
index, which you can think of as the observation time, and you wish
to compute moments over a fixed time window. To bloat the API, the
times from which you 'look back' can optionally be something other
than the time indices of the input, so the input and output size
can be different.</p>
<p>Some example uses might be:</p>
<ul>
<li>Compute the volatility of an asset's returns over the previous 6 months,
on every trade day.</li>
<li>Compute the total monthly sales of a company at month ends.</li>
</ul>
<p>Because the API also allows you to use weights as implicit time deltas, you can
also do weird and unadvisable things like compute the Sharpe of an asset
over the last 1 million shares traded.</p>
<p>Speed improvements come from my random walk through c++ design idioms.
I also implemented a 'swap' procedure for the running standard deviation
which incorporates a Welford's method addition and removal into a single
step. I do not believe that Welford's method is the fastest algorithm
for a summarizing moment computation: probably a two pass solution to
compute the mean first, then the centered moments is faster. However,
for the case of <em>running</em> moments computations, Welford's method
probably is the fastest. </p>
<p>Here is an example of the speedups seen in the 0.2.0 release. Again,
I am cherry picking the running standard deviation computation, but I
believe most methods have seen at least some modest improvements in speed.
I compute the base <code>sd</code> of the data just a baseline to compare times
in two different versions. Here are timings under the 0.2.0 code:</p>
<div class="highlight"><pre><span></span><span class="kp">options</span><span class="p">(</span>width<span class="o">=</span><span class="m">180</span><span class="p">)</span>
<span class="kp">options</span><span class="p">(</span>digits<span class="o">=</span><span class="m">2</span><span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>fromo<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>microbenchmark<span class="p">)</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
x <span class="o"><-</span> rnorm<span class="p">(</span><span class="m">1e5</span><span class="p">)</span>
fromo_running_sd <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="p">{</span>
running_sd3<span class="p">(</span>x<span class="p">,</span>window<span class="o">=</span><span class="m">1000</span><span class="p">,</span>na_rm<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span>restart_period<span class="o">=</span><span class="m">10000L</span><span class="p">)</span>
<span class="p">}</span>
<span class="kp">gc</span><span class="p">()</span>
</pre></div>
<div class="highlight"><pre><span></span> used (Mb) gc trigger (Mb) max used (Mb)
Ncells 4.2e+06 225 7.0e+06 372 8.2e+06 439
Vcells 5.6e+07 430 1.8e+08 1358 1.8e+08 1358
</pre></div>
<div class="highlight"><pre><span></span><span class="c1"># compute sd(x) as a reference </span>
microbenchmark<span class="p">(</span>sd<span class="p">(</span>x<span class="p">),</span>
fromo_running_sd<span class="p">(</span>x<span class="p">),</span>
times<span class="o">=</span><span class="m">1000L</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="n">Unit</span><span class="o">:</span> <span class="n">microseconds</span>
<span class="n">expr</span> <span class="n">min</span> <span class="n">lq</span> <span class="n">mean</span> <span class="n">median</span> <span class="n">uq</span> <span class="n">max</span> <span class="n">neval</span> <span class="n">cld</span>
<span class="n">sd</span><span class="o">(</span><span class="n">x</span><span class="o">)</span> <span class="mi">303</span> <span class="mi">313</span> <span class="mi">338</span> <span class="mi">326</span> <span class="mi">350</span> <span class="mi">545</span> <span class="mi">1000</span> <span class="n">a</span>
<span class="n">fromo_running_sd</span><span class="o">(</span><span class="n">x</span><span class="o">)</span> <span class="mi">1427</span> <span class="mi">1478</span> <span class="mi">1813</span> <span class="mi">1618</span> <span class="mi">1955</span> <span class="mi">44866</span> <span class="mi">1000</span> <span class="n">b</span>
</pre></div>
<p>Under the old 0.1.3 code:</p>
<div class="highlight"><pre><span></span>Unit<span class="o">:</span> microseconds
expr min lq mean median uq max neval cld
sd<span class="p">(</span>x<span class="p">)</span> <span class="m">457</span> <span class="m">473</span> <span class="m">498</span> <span class="m">488</span> <span class="m">508</span> <span class="m">767</span> <span class="m">1000</span> a
fromo_running_sd<span class="p">(</span>x<span class="p">)</span> <span class="m">4183</span> <span class="m">4523</span> <span class="m">5168</span> <span class="m">4730</span> <span class="m">5623</span> <span class="m">66529</span> <span class="m">1000</span> b
</pre></div>
<p>I would call this a win, except you might interpret it as meaning the old code
was crap. Also, I worry about the maximum time taken, which suggests some
kind of lurking boo-boo.</p>Twelve Dimensional Chess is Stupid2018-10-16T22:24:30-07:002018-10-16T22:24:30-07:00Steventag:www.gilgamath.com,2018-10-16:/twelve_dimensional_chess.html<p>Chess and the Curse of Dimensionality</p><p>I cringe when I hear the term "Twelve Dimensional Chess" used as a metaphor.
Certainly Twelve Dimensional Chess would be hard to visualize, and would present
far more possible moves than regular two dimensional chess. However, high dimensional
Chess suffers from a Curse of Dimensionality as the number of squares
grows so quickly that play becomes uninteresting. In fact, I suspect that strategies
exist which effectively guarantee a draw in sufficiently high dimensions.</p>
<!-- PELICAN_END_SUMMARY -->
<p>Consider a Queen attacking a King in our stupid old Two Dimensional Chess. The
Queen can cover or attack seven of the nine squares available to the King, as
shown below:</p>
<p><img src="https://www.gilgamath.com/figure/twelve_dimensional_chess_showqueen-1.png" title="plot of chunk showqueen" alt="plot of chunk showqueen" width="500px" height="300px" /></p>
<p>If the two remaining squares are occupied, or do not exist because the King is
against the boundary, the Queen can give checkmate.</p>
<p>However, in higher dimensional Chess, the Queen attacks a far smaller proportion of the squares adjacent to a King.
In dimension <span class="math">\(d\)</span> the number attacked or covered is <span class="math">\(2^{d+1} - 1\)</span>.
For <span class="math">\(d=12\)</span>, the number of squares attacked is <span class="math">\(8191\)</span>.
However, the King has <span class="math">\(3^{12}=531441\)</span> squares in his neighborhood when not against the boundary.
The queen covers only about <span class="math">\(1.54\%\)</span> of these squares, so you would need <span class="math">\(65\)</span> Queens to give checkmate.
A non-losing strategy seems to be: </p>
<blockquote>
<p>Move your King away from the boundary on your first move, and keep away
from the boundary.
Your opponent cannot promote enough pawns to give checkmate.</p>
</blockquote>
<p>Actually, I do not know the rules of high dimensional Chess, and I have assumed the players start with eight pieces and eight pawns.
Maybe the number of pawns is linear, or even exponential in the dimension.
Even so, it will take over <span class="math">\(300\)</span> moves to promote the <span class="math">\(64\)</span> pawns to Queens.
Moreover, assuming one's opponent could muster such an army without any losses,
assembling such a large number of Queens in place to achieve checkmate might be tricky.
Each Queen attacks at most <span class="math">\(1.86\)</span> million squares.
Again, this would be hard to visualize during play, but there are
<span class="math">\(2.2\)</span> <em>billion</em> internal squares (<em>i.e.</em> those not touching a boundary),
and some <span class="math">\(68.7\)</span> billion in total.
Which means that even if your opponent has dozens of Queens on the board,
each Queen can attack only a small fraction of the available squares.
You could move your King largely at random without coming under attack.</p>
<p>So "Twelve Dimensional Chess" as a metaphor for a situation requiring great foresight or strategy in the face of many possible decision is flawed.
Instead, it is a more apt metaphor for a very lonely random walk punctuated by infrequent interactions with others you can easily dodge.</p>
<p>See also <a href="https://www.johndcook.com/blog/2018/07/19/3d-chess-knight-moves/">John D. Cook on 3D Knight Moves</a>.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>R in Finance 20182018-06-01T10:00:32-07:002018-06-01T10:00:32-07:00Steventag:www.gilgamath.com,2018-06-01:/rfin2018.html<p>Review of R in Finance 2018 conference</p><!-- image: rfin2018_cover.png image_alt: Review of R in Finance 2018. -->
<p>2018 marks the tenth year of R in Finance. Once again, here is my biased and incomplete take on
the proceedings. </p>
<!-- Date: Fri May 19 2017 09:30:24 -->
<!-- PELICAN_END_SUMMARY -->
<h3>Day One, Morning Lightning Round</h3>
<ol>
<li>
<p>Yu Li started the conference with a lightning talk on whether click and visit data on
Morningstar's website was predictive of fund flows, using ordinary linear regression,
when the usual dependent variables (beta, momentum, <em>etc.</em>) are taken into account. The
early results looked inconclusive. (<em>n.b.</em>: if visits and clicks are dependent on market
movement, there will be complicated interactions with momentum that would have to be
controlled.) Another takeaway is that your actions on the internet are a potential
gold mine (well, data mine) for <em>someone</em>.</p>
</li>
<li>
<p>Daniel McKellar used a graph theory view of companies to compare
geographical effects to sector and industry effects. The target metric is something called
'modularity'. (My fear is that this metric is defined in a trivially gameable way, but this
could be due to my general paranoia.) Turns out that country clustering gives higher modularity
than sector or industry clustering (yay), but then clustering by country <em>and</em> sector gives
lower modularity than just by country (uh, uhoh). There followed a linear regression model
of the correlation matrix entries to check for clustering effects. While people are trained
to digest linear regression models (and perhaps this is the way to go with upper management),
I hope there are more advanced techniques for covariance cluster analysis.</p>
</li>
<li>
<p>Jonathan Regenstein presented a shiny page that performs Fama French decomposition on a
portfolio that the user enters. He bemoaned the weird choice of distribution channel for
Ken French's data (zipped CSV with header junk). I have packaged a few of the
datasets into a <a href="https://github.com/shabbychef/aqfb_data">data package for my book</a>.
But I have been thinking there should be a canonical data package that contains all the FF data
(it only updates once a year, and could be made programmatic.).</p>
</li>
</ol>
<h2>Day One, Morning Talks</h2>
<p>Kris Boudt gave a talk on a new kind of regularized covariance estimator. The idea is
to combine the row subselection of MCD with a kind of shrinkage to deal with the 'fat data'
problem (more columns than rows). The motto appears to be 'shrink when needed.'
He presented the 'C-step' theorem which underpins their
algorithm: suppose you subselect some rows of data, compute sample mean and
covariance, then define <a href="https://en.wikipedia.org/wiki/Mahalanobis_distance">Mahalanobis distances</a>
on that sample mean and covariance. Then if you pick another subsample of the data
that has smaller sum of Mahalanobis distances than your original sample, then
than new subset will have smaller covariance determinant. The implication then is to
just compute a Mahalanobis function, then take the subset of the smallest Mahalanobis
distances and iterate. This shows that the algorithm converges, and determinant
decreases at each step. (I confirmed with Kris that the objective is not convex,
so this method falls into local minima; to find a global minimum, you have to employ
some tricks.)
He followed up with
some examples showing how the algo works; toy data experiments confirm it helps
to have clairvoyance on the population outlier rate. His example computing
the minimum variance portfolio triggered me, as I am not convinced covariance shrinkage
should be used for portfolio construction. </p>
<hr>
<p>Majeed Simaan gave a talk on 'Rational Explanations for Rule-of-Thumb Practices in
Asset Allocation.' He seemed to be looking for conditions under which one would prefer
the Global Minimum Variance portfolio, the Mean Variance Portfolio, or the Naive
Allocation Portfolio. These are based on the likely estimation error. My interpretation
is that if, for example, the true Markowitz portfolio weights (err, MVP in Majeed's
terminology) are widely dispersed, you are less likely to make a deleterious estimation error
in constructing the sample Markowitz portfolio. These rules of thumb are translated
into more easily digestible forms which can be tested. I look forward to the paper.</p>
<h2>Keynote</h2>
<p>Norm Matloff gave a keynote on 'Parallel Computation for the Rest of Us'. He notes
that there are a number of paradigms for parallelizing computation in R, with
varying levels of abstraction and sophistication. However, it is (still) the case
that using these packages requires some knowledge of hardware (<em>e.g.</em> caching)
and how the computations are parallelized. There is no 'silver bullet' that automatically
parallelizes computations. He described the two key design paradigms of the <code>partools</code> package:
Leave It There (<em>ie</em> bring computation to the data, and leave the data there), and
Software Alchemy (try to automatically convert regular problems into approximately
equivalent Embarassingly Parallel problems, and solve those instead). </p>
<hr>
<h2>Day One, Afternoon Talks</h2>
<p>Matthew Ginley gave a talk about forecasting rare events under monotonicity constraints. I don't
think I can do his technique justice, but he seemed to start with a density estimator, then
estimate the proportion of rare events at values of the independent variable (or 'feature')
near where the rare events were observed, then he constructed a monotonic regression on those
values. My notes say I should look up ROSE (random over sampling examples) and
BART (Bayesian Additive Regression Trees), and that
the 'usual' metrics you might use to score performance (like MSE, or 0/1 loss) may give
counterintuitive results.</p>
<hr>
<p>Rainer Hirk discussed multivariate ordinal regression models using
<a href="https://cran.r-project.org/web/packages/mvord/index.html"><code>mvord</code></a>. He presented some problems around credit rating data from the big three (Moody's, Fitch, S&P):
how are the ratings related to each other, how do they change over time, can they be explained
by independent variables (features of the rated companies), and so on. Ordinal regression
apparently works as a latent variable with some thresholds to determine the classes.
The <code>mvord</code> package can handle all the questions he threw at it.</p>
<h3>Lightning Round</h3>
<ol>
<li>
<p>Wilmer Pineda had some of the prettiest slides I saw all day, but I had a difficult time
understanding his talk, I'm afraid.</p>
</li>
<li>
<p>Neil Hwang talked about 'bipartite block models'. The idea seemed that you might have a
bipartite graph with edges defining some commonality between nodes, and you want to detect
communities among them. (This reminds me of some work I did in my short stint in the film
industry on trying to detect similarities between actors and films based on the 'appeared-in'
definition of edges.)</p>
</li>
<li>
<p>Glenn Schultz gave an advertorial for <a href="http://bondlab.io">bondlab</a>, which appears to be a
bond pricing package of the same name, connected to data from their web site. I think if
you work in fixed income, you'll want to take a look at this.</p>
</li>
<li>
<p>Dirk Hugen gave a talk on using R in Postgres via the PL/R extension. This is a nice trick if
you use Postgres: you can basically ship R UDFs into the database and run them there. I am
always a fan of having the DB do the work that my desktop cannot. </p>
</li>
</ol>
<h2>Talk</h2>
<p>Michael Gordy, from the Federal Reserve, discussed 'spectral backtests'. Apparently banks
produce 1 day ahead forecasts of their profit and loss every night.
The banks also track what their actual PnL is (weird, right?), which are then translated
into quantiles under their forecasts.
The question is whether the forecasts are any good, and whether that can be quantified
without dichotomizing the data. (For example, by looking at
the proportion of actuals at or above the 0.05 upper tail of loss forecasts.)
I didn't follow the transform, but he took an integral with some weighting function, and
out popped some hypothesis tests. Go check out the
<a href="https://doi.org/10.17016/FEDS.2018.021">paper</a>, and apparently there is a package coming as
well.</p>
<h3>Lightning Round</h3>
<ol>
<li>
<p>Mario Annau gave a progress report on <code>hdf5r</code>, which provides HDF5 file support for R.
HDF5 is still probably the best multi-language high performance data format, and this package
was apparently rewritten for performance by cutting out some C++ middleman code.
The roadmap for this package includes <code>dplyr</code> support, which would be a welcome feature.</p>
</li>
<li>
<p>David Smith gave a talk promoting the new Azure backend for the
<a href="https://cran.r-project.org/web/packages/foreach/index.html"><code>foreach</code></a> package.</p>
</li>
<li>
<p>Stephen Bronder gave a progress report on porting Stan to GPUs. Matrix operations like
inversion and Cholesky factorization are hard to parallelize, but they are coming to Stan.</p>
</li>
<li>
<p>Xin Chen presented the
<a href="https://github.com/chenx26/glmGammaNet"><code>glmGammaNet</code></a> package
to perform Elastic Net (L1 and L2 regularized) regression, but with Gamma distributed data,
which is appropriate for non-negative errors.</p>
</li>
<li>
<p>JJ Lay discussed multilevel Monte Carlo simulations for
stochastic volatility and interest rate modeling. Apparently he achieved 10 thousand
time reduction in runtime (!) from a serial computation by parallelizing in this way.</p>
</li>
</ol>
<h2>Talks</h2>
<p>Michale Kane gave a talk on an analysis of cryptocurrency pairs prices from the
Bittrex market.
(The first part was a hilarious tour through the shady contraband-for-bitcoin market.)
He used SVD to approximate the returns of <span class="math">\(p=290\)</span> returns of currency pairs down
to dimension <span class="math">\(d\)</span>.
He used Frobenius norm of error of this approximation, plus a <span class="math">\(d/\sqrt{p}\)</span> regularization
to optimize <span class="math">\(d\)</span>, finding that <span class="math">\(d\approx 2.5\)</span> was
consistent with his sample, fluctuating perhaps to 4 at times. My interpretation
is that people think of cryptocurrencies as Bitcoin and also-rans, although perhaps
there is a numeraire effect in there. As Michael put it, despite the
variety of coins, their returns are not well differentiated.</p>
<hr>
<p>William Foote gave a presentation in the form of a shiny dashboard, instead of
a slideshow, on the topic of, I think, shipping metals and metals prices.
A fair amount of time was spent showing the source code for the shiny page,
rather than demo'ing the page. If you are in the business of shipping Copper,
Alumnium or Nickel around, you would definitely want a dashboard like this,
but it is not clear how to interpret all the plots (contours of correlation
in an animation?) or 'drive' actions from the dashboard.</p>
<h3>Lightning Round</h3>
<ol>
<li>
<p>Justin Shea discussed Hamilton's working paper, "Why you should never use the
Hodrick-Prescott filter." (I <em>love</em> unambiguous paper titles!) He implemented
Hamilton's suggested replacement for the HP filter for detrending time series, in the
<a href="https://cran.r-project.org/web/packages/neverhpfilter/index.html"><code>neverhpfilter</code></a> package, which implements this regression, returning a <code>glm</code> object.</p>
</li>
<li>
<p>Thomas Zakrzewski talked about using a <a href="https://en.wikipedia.org/wiki/Q-Gaussian_distribution">'Q-Gaussian' distribution</a>
(apparently it generalizes Gaussian, <span class="math">\(t\)</span>, and bounded Gaussian-like symmetric distributions)
in Merton's model for probability of default.</p>
</li>
<li>
<p>Paul Laux talked about inferring the cost of insuring against small
and large market movements from the returns of VIX futures and delta-neutral
SPX straddles. He looked at these inferred costs around news announcement
dates (jobs reports, FOMC meetings) versus other times, and found that the
costs were significantly non-zero around news dates.</p>
</li>
<li>
<p>Hernando Cortina gave a talk for <a href="https://justcapital.com/">Just Capital</a>,
which is apparently a non-profit, established by Paul Tudor Jones, that
analyzes companies based on ESG criteria (Environment, Social, Governance). He
created quintile portfolios based on these rankings, and found that, over a
1 year out-of-sample period, the "most socially responsible" quintile outperformed the
least responsible. (Years ago I worked at a fund that tried to build a SRI
vehicle for a client, without much luck.)
He then tried to decompose the 'alpha' in the responsible quantile in terms of the ingredients
in their Just companies index.</p>
</li>
</ol>
<h2>Keynote</h2>
<p>J. J. Allaire gave a talk on Machine Learning with TensorFlow. TensorFlow
is apparently a numerical computing library, which is hardware independent and open source,
running on CPU, GPU or TPUs (if you can find one). It defines a data flow
process which is executed in C++, and reminds me somewhat of Spark.
The Models that are built are language-independent (again, like Spark's MLLib).
He then talked about Deep Learning, which, if I understood correctly, is
just neural nets with <em>lots</em> of layers ('deep'), like hundreds maybe. These
are good for 'perceptual-like' tasks, but maybe not so much other areas (uhh, finance?).
Apparently Deep Learning has become more popular now because we now have
the computational resources and the massive amounts of data to train such huge
neural nets, some of which have millions of coefficients. (I imagine if you could
analyze how a human brain recognizes digits, say, it would involve thousands
of neurons; encoding them all in ten thousand parameters seems about right.)
Deep Learning still has problems: the models are not interpretable, can be
fooled by adversarial examples, and require lots of data and computational power.</p>
<p>He introduced Rstudio's <a href="tensorflow.rstudio.com">tensorflow packages</a>, including the
<a href="https://cran.r-project.org/web/packages/keras/index.html"><code>keras</code></a> package. The package gives you access to a plethora of layer types you might
want to put in a Deep Learning model, some appropriate for, say, graphical or image
learning, or time series or language processing <em>etc.</em> You do need a fair amount
of domain knowledge to create a good collection of layers, and apparently lots of
experimentation is required. (I'm predicting that Deep MetaLearning will be the big thing
in ten years when we have more data and computational power.)
He ran through example uses of TensorFlow from R: classification of images, weather forecasting,
fraud detection, <em>etc.</em> The package ecosystem here seems ready for use.
For more info, do check out
<a href="https://www.manning.com/books/deep-learning-with-r">Deep Learning with R</a>, or the more
theoretical book, <a href="http://www.deeplearningbook.org/">Deep Learning</a>.</p>
<hr>
<h2>Day Two</h2>
<h3>Lightning Round</h3>
<ol>
<li>
<p><a href="https://quantstrattrader.wordpress.com/">Ilya Kipnis</a> started the day by
describing some technical-based strategies on VIX ETNs. It's apparently hard
to do worse
than <a href="https://www.marketwatch.com/story/xiv-trader-ive-lost-4-million-3-years-of-work-and-other-peoples-money-2018-02-06">buy and hold XIV</a>.
I talked to Ilya after the conference, and he tells me he has "skin in the game,"
so this is not just another bunch of quant farts on a blog .</p>
</li>
<li>
<p>Matt Dancho gave a talk on
<a href="https://cran.r-project.org/web/packages/tibbletime/index.html"><code>tibbletime</code></a>,
which provides a time-aware layer over <code>tibble</code> objects, with 'collapse by time'
operators (which act like groupings, I think, but are applied in tandem with <code>group_by</code>),
a 'rollify' operator which (naively) applies functions at each point in time over a fixed window,
time subselection
operations and more. He also mentioned <a href="https://github.com/DavisVaughan/flyingfox"><code>flyingFox</code></a>
which uses <code>reticulate</code> to communicate with the Quantopian <code>zipline</code> package. <code>zipline</code>,
while rather weak compared to what most quant shops will develop in-house, is the <em>only</em>
open source backtesting engine that I know. It is good to see this is coming to R.
(I should note this packages seems similar to the
<a href="https://cran.r-project.org/web/packages/tsibble/index.html"><code>tsibble</code></a> package.)</p>
</li>
<li>
<p>Carson Sievert talked about <code>dashR</code>, a not-yet-released packag for using
<a href="http://dash.plot.ly"><code>dash</code></a>, which is Python's latest attempt to replicate <code>shiny</code> (<code>pyxley</code> having suffered an early death,
apparently). I suppose someone will find this useful, but I was not convinced by Carson's
arguments in favor of this approach: easy switching between Python and R, and the ability
to quickly import new React components. <em>If</em> the syntax of this framework were much easier
to think about than <code>shiny</code>, it would certainly win some converts, but I believe reactive
programming is just hard to reason about. At this point many users have learned to
embrace the weirdness of <code>shiny</code> and will be unlikely to defect.</p>
</li>
<li>
<p>Michael Kapler: Interactively Exploring Seasonality Patterns in R
<code>rtsviz</code> package. This seems to be a package with a shiny page that
can quickly give you a view of the seasonality of your time series data.</p>
</li>
<li>
<p>Bernhard Pfaff introduced the
<a href="https://github.com/bpfaff/rbtc"><code>rbtc</code></a> package. This wraps the bitcoin API for looking at the blockchain.
This is complementary to the <code>rbitcoin</code> and <code>coindeskr</code> packages, which
seem to provide <em>pricing information</em>. Expect more from this package in the
coming year (perhaps the ability to <em>mine</em> coins, or define your own wallet.)</p>
</li>
</ol>
<h2>Talks</h2>
<p>Eran Raviv gave a talk about combining forecasts using the
<a href="https://cran.r-project.org/web/packages/ForecastComb/index.html"><code>ForecastComb</code></a> package. As an example he showed a few different forecast methods
applied to a time series of UK electricity supply. The
package supports <em>many</em> different methods of combining forecasts:
simple averaging; OLS combination (which outperforms simple averaging,
but might not be <em>convex</em> in the forecasts, sometimes extrapolating
from them); trivial methods like median, trimming, <em>etc.</em> ;
accuracy based methods, like inverse Rank, inverse RMSE, Eigenvector approach;
regression based methods: OLS, LAD, CLS, subset regressions.
There are also summary and plotting functions. If you are combining
forecasts, this is the package to use.</p>
<hr>
<p>Leopoldo Catania motivated the
<a href="https://cran.r-project.org/web/packages/eDMA/index.html"><code>eDMA</code></a> (efficient dynamic model averaging)
package by looking at predicting cryptocurrency returns using
some predictive features: technical features on the returns
themselves and macroeconomic features. The model under consideration
looks like the setup for a Kalman Filter, with a linear model
where the coefficients change under an AR(1) model, but instead
somehow summarized by a 'forgetting factor'. A consequence
is that, somehow, yo have to perform linear regressions on all
subsets, using multiple forgetting factors, and maybe evaluate
them all on a rolling basis. The good news is that this package is
fairly efficient, using <code>Rcpp</code> and <code>RcppArmadillo</code>, and is
perhaps 50 times faster than the
<a href="https://cran.r-project.org/web/packages/dma/index.html"><code>dma</code></a> package, but still it takes around an hour to run a regression
with 18 features and 500 rows. And the results were hard for
me to interpret, and seemed to be worse than the benchmark method
under MSE metric. (And the claim that predictability 'increased over time'
could possibly be attributed to the longer time series?)</p>
<h2>Talks</h2>
<p>Guanhao Feng gave a talk on "Deep Learning Alpha". As I understand it,
the motivation was that there is a veritable "zoo" of factors and factor models
(see Harvey & Liu (2016)), but factors are typically defined oddly.
That is, most factor returns are defined to be relatively robust to how
you would define the purported anomaly ('size', 'momentum', <em>etc.</em>),
and are rebalanced annually. The speaker, I think, was looking to
use Deep Learning to 'automatically' define factors which would be less
subject to our lame human ideas of what factors should look like.
(The speaker noted that you cannot use ML 'directly' to forecast cross
sectional returns because of imbalanced data and missing values: not all
features are defined at all times, not all stocks exist at all times, there
are mergers and aquisitions, <em>etc.</em>) I think I missed the part where
the model was compared to Fama French 3 or 5 factor models.</p>
<hr>
<p>Xiao Qiao gave an interesting talk on <em>correlated</em> idiosyncratic volatility shocks.
The idea is that idiosyncratic volatility has cross-sectional correlation (called,
"TVV" for Time Varying Vol), as well as autocorrelation ("VIN" (not <em>that</em> VIN), for
Volatility INnovations.) He built what he called a 'Dynamic Factor Correlation' model,
which generalizes Bollerslev's CCC and Engle & Kelly's DECO models. He found that
there <em>is</em> a signficant cross-sectional correlation of GARCH residuals (TVV), then
built portfolios based on sorts (two sorts, if I recall), and showed that the
lowest quintile portfolios outperformed the highest quintile. The interpretation
from the speaker was roughly that
high VIN securities are a kind of 'insurance' against vol spikes, and
high TVV securities payout when vol is high in general.
(There was also a "Lake Volbegone" effect, where <em>all</em> the portfolios had above-average
excess returns, but Stephen Rush pointed out this was likely due to the difference
between simple averaging and value averaging.) My notes tell me to look up
Ang <em>et al.</em> (2006) and Herskovic <em>et al.</em> (2014).</p>
<h2>Keynote</h2>
<p>Li Deng gave a talk on using AI in finance. Li drove the AI effort at Microsoft before joining
Citadel. While he couldn't be terribly specific about what he is doing now, he gave a good
overview of the history of AI, including its successes in perceptually tasks. He was also
fairly honest about the challenges of using AI in finance: low (I would say, "very low")
signal-noise ratio compared to perceptual tasks, nonstationarity and adversarial landscape, and
the heterogeneity of big data. My guess is that the first of those is the biggest
problem, while the third is an engineering, or model design, challenge.</p>
<h3>Lightning Round</h3>
<ol>
<li>
<p>Keven Bluteau: gave a talk on sentiment analysis. (I think I approached him after the
conference, and after two drinks, and told him I enjoyed his <em>talk about hdf5</em>. Oooops! Sorry!
You don't really look like Mario!) This was one of a slew of talks about sentiment, and
they also around when my computer decided to remount its filesystem read-only. (ack!)</p>
</li>
<li>
<p>Samuel Borms talked about the
<a href="https://cran.r-project.org/web/packages/sentometrics/index.html"><code>sentometrics</code></a> for computing and aggregating textual sentiment.</p>
</li>
<li>
<p>Kyle Balkissoon gave a short talk on using weather data to create weather-based
signals on companies (I feel like this idea time traveled from the 60's), as well as
building text-based signals on companies. The latter is, as Kyle noted, fairly difficult,
(as is <em>any</em> signal construction) unless you can really represent what one company is
over time. (In addition to the Ship of Theseus argument, splits and mergers complicate
the picture, and they complicate our understanding of textual data about companies. I suspect
that everyone at the conference who uses CRISP data just sweeps this under the rug, which would be
worth the price of admission.)</p>
</li>
<li>
<p>Petra Bakosova, from Hull Tactical, gave an impromptu talk on seasonal effects, which includes calendar-based
effects (month boundaries, January effect, weekend effect, sell in May), as well
as 'announcement' dates (FOMC, and maybe earnings announcements?). Building several seasonal strategies,
she found that many had higher Sharpe than Buy and Hold around announcements (this seems odd if
they are long only), but has lower overall return because the capital is not deployed at all
times. (On the other hand, if the seasonal strategies could 'share' capital with other kinds of
strategies, maybe it would all work out.)</p>
</li>
<li>
<p>Che Guan gave a talk on using Machine Learning for 'digital' (or you might say, 'crypto')
currency predictions, using technical factors on the coin returns as well as macroeconomic
features. I would like to see his results compared and constrasted with those of Leopoldo Catania,
who seemed to target the same application with different methods.</p>
</li>
</ol>
<hr>
<h2>Afternoon Talks</h2>
<p>David Ardia gave a talk about sparse forecasting using news-based sentiment.
The motivating problem was forecasting economic growth. In Europe, this
is apparently done by 'ESI', which is some kind of average of survey responses.
Can this be automated, sped up, even improved by text-based forecasts? David
pursued a penalized least squares approach. The recipe is:
classify texts by topic (economic, labor, government, <em>etc</em>), and choose a subset of topics;
using multiple lexicons (lexica?) compute the sentiment of each text at time <span class="math">\(t\)</span>;
aggregate across topics to get some sentiments;
to obtain a bunch of topic-based sentiments;
get some time series aggregated values (a little hazy here);
take a linear combination to get the best forecasts.</p>
<p>Using Germany as an example, he looked at news from LexisNexis from the mid 90's to 2016,
filtered articles by geography, topic, article size, applied bag of words sentiment
calculation (I think these are 'bivalent' indicators) using 3 lexicons, collapse by
lexicon and then looked at sentiment by time and topic. The takeway was that the sentiment
indicator seemed to capture the same dynamics as ESI, but perhaps reacted more
quickly to the Great Financial Collapse. He also found that <em>combining</em> the sentiment
indicator and ESI improved forecasts.</p>
<hr>
<p>Dries Cornilly gave a really nice talk on the
<a href="https://github.com/cdries/rTrawl"><code>rTrawl</code></a> package for modeling High Frequency Financial Time Series.
In the setup he presented some stylized facts of high frequency returns
data.
He plotted the autoregressive coefficient for returns in an AR(1) model at different
observation frequencies. It exhibits an odd dip to around -0.2 or so at
a period of around 1 second, but is otherwise around zero. Why the dip?
He then also plots the variance of returns divided by T versus the observation
frequency, which goes from around 0.05 down to zero. Again, why?
He outlined some of the approaches to the problem, then described the
integer valued Lévy processes and 'trawl' processes. From what I understood,
you first generate some some finite set of points in some space, one dimension
representing time. Then you imagine sweeping across time and computing the
sum of all points within the 'wake' of your sweep. In fact, you don't have to imagine the sweep,
he showed animations of the sweep. The Lévy processes have like a constant wake, while
the trawl processes are supposed to evoke a fisherman with a finite sized net
from which the 'fish' escape. He also described a combination of the
Lévy and trawl processes, which is like, uhh, a weird net, I guess. Anyway, the
<code>rTrawl</code> package apparently supports computing these things, as well as
estimating the parameters from an observed series. The parameters would be, I think,
the generating process for the points (err, 'fish'), and maybe the size of the 'net'
or something.
(I don't really know how we transitioned from SPX trades to fish, but it worked.)
The kicker at the end is that the combined trawl processes have closed form
AR(1) coefficients and variance, so he showed the plots from the beginning of the
talk along with the values from the trawl fit, and they match very well! </p>
<p>Luis Damiano gave a talk on Hierarchical Hidden Markov Models in High-Frequency
Stock Markets. I think the idea was to create a Hidden Markov Model on stocks
("bullish", "bearish"), but then have another level of hidden Markov Models
on top of that (thus "hierarchical"). He backtested this system on a couple
of stocks over a short time period, but the story out of sample seemed
inconclusive (in contrast to a 2009 article by Tayal he referenced). As a side
note, apparently the github page for this project has some L1 and L2 tick data
that you can play along with.</p>
<hr>
<h2>Intermission</h2>
<p>So, this happened:</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Are you looking to attend an all-male conference in <a href="https://twitter.com/hashtag/DataScience?src=hash&ref_src=twsrc%5Etfw">#DataScience</a>? <a href="https://twitter.com/hashtag/rfinance2018?src=hash&ref_src=twsrc%5Etfw">#rfinance2018</a> has got you covered! 🧑🏻🙋🏻♂️🧓🏻👱🏻♂️🤵🏻👨🏻🧔🏻👴🏻👨🏻💼<br>100% male committee, 100% male speakers, no Code of Conduct. Yes, this is 2018! 📆 <a href="https://t.co/EfhR1QhwWj">https://t.co/EfhR1QhwWj</a> <a href="https://twitter.com/hashtag/BinderFullofMen?src=hash&ref_src=twsrc%5Etfw">#BinderFullofMen</a> 👬 <a href="https://t.co/NLbS31y43V">pic.twitter.com/NLbS31y43V</a></p>— Women in ML/DS (@wimlds) <a href="https://twitter.com/wimlds/status/1002597607468761088?ref_src=twsrc%5Etfw">June 1, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This tweet went out on the first day of the conference, and there was a pile-on on twitter.
(It looks like I picked the right year not to give a talk!).
While I have to admit this tweet was very effective at drawing attention to the problem,
drawing attention is only a step towards solving a problem.
The organizers were able, on short notice, to get a talk from two members of "R Ladies", one
who I believe was attending the conference anyway, to talk solutions. They suggested
a Code of Conduct for the conference, which makes sense: the conference draws people from
different backgrounds and cultures, and it is better to make explicit how they should
be expected to behave. Moreover, if it increases attendance and improves audience diversity,
I am all for it. The two ladies also made a call to action to the audience for us to
proactively seek diversity. This is a much larger conversation that is appropriate
for this review (and nobody reads this blog for a conversation), but I do hope
to see positive changes in diversity and inclusion at the conference, but also in the
industry as a whole.</p>
<hr>
<h3>Lightning Round</h3>
<ol>
<li>
<p>Phillip Guerra talked about 'autotrading', which appears to be his terminology for
taking a backtest to market. Phillip is an anesthesologist who moonlights in asset management.
One reason I love this conference is its big tent approach to speakers, who range from
academics to industry to independents.</p>
</li>
<li>
<p>Bryan Lewis gave a talk about Stat Arb Something Something, based on some
offhand comments he made at the conference a few years back about how you could
just quickly throw together a stat arb strategy. The idea is to find groups of
stocks with cointegration relationships, and trade in expectation of a reversal
when they diverge from the relationship. Bryan filled in some of the details to this general
sketch. One of the problems is there is a huge number of combinations of assets
to check for cointegration, and the classical tests do not scale well
(in terms of coverage, I believe) to large numbers of time series. Bryan talked
about using spectral clustering on the regularized covariance of returns
to get candidate sets of assets, then use a Bayesian approach to cointegration.
The reference I am to check is a 2002 paper by
<a href="http://www.carolalexander.org/publish/download/JournalArticles/PDFs/RIBF_16_65-90.pdf">Alexander, Giblin and Weddington.</a></p>
</li>
</ol>
<h2>Talks</h2>
<p>I was starting to think that cryptocurrency talks would outpace total meme count
this year, then Stephen Rush killed it for team meme with his talk on
Currency Risk and Information Diffusion. The motivating idea for the talk
is that
Information moves from currency markets to equity markets at different speeds.
Can we analyze that speed and figure out why it is faster or slower for
some companies.
The speaker computed VPIN from second-resolution NYSE TAQ data, then downsampled
to daily frequency,
used CRISP daily data for around 20 years on about 17K firms,
then built a linear model for returns of each firm taking into account some future information.
The normalized regression coefficients then give you some idea of the 'price adjustment'
which is basically a measure of the <em>inefficiency</em> of each stock. The speaker
found that VPIN, Size, Turnover and Analyst's coverage had negative effects
on this price adjustment (<em>i.e.</em> are indicative of higher efficiency), while
Institutional Ownership has a positive effect (lower efficiency). This latter
factor is associated with a significant alpha, on the order of around 6% annualized
for the top decile.</p>
<hr>
<p>Jasen Mackie gave a talk on 'Round Turn Trade Simulation'. This seemed to be
related to the idea of random portfolios, but focused on computing <em>e.g.</em>
the expected maximum drawdown of a trading strategy by sampling from its
'round turn' trades (that is, positions which are opened then closed, presumably
defined in a LIFO sense). Using the <code>blotter</code> object, the speaker extracted
some stylized facts of the trading strategy: duration of these trade,
ratio of long to short, maybe position sizes , <em>etc.</em> Then random realizations
were drawn with similar properties. I guess you can think of this as a kind
of bootstrap of the backtest. I suppose the autocorrelation of trades would
be much trickier to establish (and I suspect would have a <em>huge</em> influence on
maximum drawdown). </p>
<hr>
<p>Thomas Harte closed the conference with a talk on
"Pricing Derivatives When Prices Are Not Observable". This is a bit different than
incomplete markets. He built a linear model for private equity returns based
on some factors, then used that linear model somehow as a proxy in a Rubinstein-type
lattice pricing scheme. From these Thomas was able to price certain options on
private equity firms (say, a leveraged buyout fund).</p>
<hr>
<p>This was another great conference, and I hope to be back next year. If you have
anything to add, feel free to comment.</p>
<!-- modelines -->
<!-- vim:ts=2:sw=2:tw=96:fdm=marker:syn=markdown:ft=markdown:ai:nocin:nu:fo=ncroqlt:cms=<!--%s-->
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Another Confidence Limit for the Markowitz Signal Noise ratio2018-03-28T21:33:59-07:002018-03-28T21:33:59-07:00Steventag:www.gilgamath.com,2018-03-28:/new_mp_ci.html<p>Another confidence limit on the Signal Noise ratio of the Markowitz portfolio.</p><p>In a <a href="bad-cis">previous blog post</a>, I looked at two asymptotic confidence intervals
for the Signal-Noise ratio of the sample Markowitz portfolio, finding that
they generally did not give nominal type I rates even for large sample sizes
(50 years of daily data). In a <a href="markowitz-cov-elliptical">followup post</a>, I
looked at the covariance of some elements of the Markowitz portfolio, finding
that they seemed to be nearly normal for modest sample sizes. However, in that
post, I used the 'TAS' transform to a <span class="math">\(t\)</span> variate, and found, again, that large
sample sizes were required to pass the eyeball test in a Q-Q plot. </p>
<p>Here I mash up those two ideas to construct another confidence limit for the
Signal Noise ratio. So I take the asymptotic covariance in the Markowitz
portfolio elements, and use them with the TAS transform to get a confidence
limit. (You can get the gory details around equation (52) of version 5 of
my <a href="https://arxiv.org/abs/1312.0557">paper</a>, which also contains the
simulations below.)</p>
<p>Here we examine all three of those confidence limits, finding that none
of them achieve near nominal type I rates. Again, I let <span class="math">\(p\)</span> be the number of
assets, <span class="math">\(n\)</span> the number of days observed, <span class="math">\(\zeta\)</span> is the population maximal
Signal Noise ratio. Here I am observing multivariate normal returns, so
the kurtosis factor is not used here. I perform a number of simulations
across different values of these parameters, each time performing 10000
simulations, computing the sample Markowitz portfolio, and its Signal Noise
ratio.</p>
<!-- PELICAN_END_SUMMARY -->
<div class="highlight"><pre><span></span><span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tidyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tibble<span class="p">)</span>
<span class="c1"># https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html</span>
<span class="kn">library</span><span class="p">(</span>doFuture<span class="p">)</span>
registerDoFuture<span class="p">()</span>
plan<span class="p">(</span>multiprocess<span class="p">)</span>
<span class="p">})</span>
<span class="c1"># one simulation of n periods of data on p assets with true optimal</span>
<span class="c1"># SNR of (the vector of) pzeta</span>
onesim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">)</span> <span class="p">{</span>
pmus <span class="o"><-</span> pzeta <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span>p<span class="p">)</span>
<span class="c1"># simulate an X: too slow.</span>
<span class="c1">#X <- matrix(rnorm(n*p,mean=pmus[1],sd=1),ncol=p)</span>
<span class="c1">#smu1 <- colMeans(X)</span>
<span class="c1">#ssig <- ((n-1)/n) * cov(X)</span>
<span class="c1"># this is faster:</span>
smu1 <span class="o"><-</span> rnorm<span class="p">(</span>p<span class="p">,</span>mean<span class="o">=</span>pmus<span class="p">[</span><span class="m">1</span><span class="p">],</span>sd<span class="o">=</span><span class="m">1</span> <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span>n<span class="p">))</span>
ssig <span class="o"><-</span> rWishart<span class="p">(</span><span class="m">1</span><span class="p">,</span>df<span class="o">=</span>n<span class="m">-1</span><span class="p">,</span>Sigma<span class="o">=</span><span class="kp">diag</span><span class="p">(</span><span class="m">1</span><span class="p">,</span>ncol<span class="o">=</span>p<span class="p">,</span>nrow<span class="o">=</span>p<span class="p">))</span> <span class="o">/</span> n <span class="c1"># sic n</span>
<span class="kp">dim</span><span class="p">(</span>ssig<span class="p">)</span> <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span>p<span class="p">,</span>p<span class="p">)</span>
smus <span class="o"><-</span> <span class="kp">outer</span><span class="p">(</span>smu1<span class="p">,</span>pmus <span class="o">-</span> pmus<span class="p">[</span><span class="m">1</span><span class="p">],</span>FUN<span class="o">=</span><span class="s">'+'</span><span class="p">)</span>
smps <span class="o"><-</span> <span class="kp">solve</span><span class="p">(</span>ssig<span class="p">,</span>smus<span class="p">)</span>
szeta <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span><span class="kp">colSums</span><span class="p">(</span>smus <span class="o">*</span> smps<span class="p">))</span>
psnr <span class="o"><-</span> pmus <span class="o">*</span> <span class="kp">as.numeric</span><span class="p">(</span><span class="kp">colSums</span><span class="p">(</span>smps<span class="p">)</span> <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span><span class="kp">colSums</span><span class="p">(</span>smps<span class="o">^</span><span class="m">2</span><span class="p">)))</span>
<span class="kp">cbind</span><span class="p">(</span>pzeta<span class="p">,</span>szeta<span class="p">,</span>psnr<span class="p">)</span>
<span class="p">}</span>
<span class="c1"># do that many times.</span>
repsim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>nrep<span class="p">,</span>zetas<span class="p">,</span>n<span class="p">,</span>p<span class="p">)</span> <span class="p">{</span>
foo <span class="o"><-</span> <span class="kp">replicate</span><span class="p">(</span>nrep<span class="p">,</span>onesim<span class="p">(</span>pzeta<span class="o">=</span>zetas<span class="p">,</span>n<span class="p">,</span>p<span class="p">))</span>
baz <span class="o"><-</span> <span class="kp">aperm</span><span class="p">(</span>foo<span class="p">,</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">3</span><span class="p">,</span><span class="m">2</span><span class="p">))</span>
<span class="kp">dim</span><span class="p">(</span>baz<span class="p">)</span> <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span>nrep <span class="o">*</span> <span class="kp">length</span><span class="p">(</span>zetas<span class="p">),</span><span class="kp">dim</span><span class="p">(</span>foo<span class="p">)[</span><span class="m">2</span><span class="p">])</span>
<span class="kp">colnames</span><span class="p">(</span>baz<span class="p">)</span> <span class="o"><-</span> <span class="kp">colnames</span><span class="p">(</span>foo<span class="p">)</span>
<span class="kp">invisible</span><span class="p">(</span><span class="kp">as.data.frame</span><span class="p">(</span>baz<span class="p">))</span>
<span class="p">}</span>
manysim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>nrep<span class="p">,</span>zetas<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>nnodes<span class="o">=</span><span class="m">7</span><span class="p">)</span> <span class="p">{</span>
<span class="kr">if</span> <span class="p">(</span>nrep <span class="o">></span> <span class="m">4</span><span class="o">*</span>nnodes<span class="p">)</span> <span class="p">{</span>
<span class="c1"># do in parallel.</span>
nper <span class="o"><-</span> <span class="kp">table</span><span class="p">(</span><span class="m">1</span> <span class="o">+</span> <span class="p">((</span><span class="m">0</span><span class="o">:</span><span class="p">(</span>nrep<span class="m">-1</span><span class="p">)</span> <span class="o">%%</span> nnodes<span class="p">)))</span>
retv <span class="o"><-</span> foreach<span class="p">(</span>i<span class="o">=</span><span class="m">1</span><span class="o">:</span>nnodes<span class="p">,</span><span class="m">.</span>export <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">'zetas'</span><span class="p">,</span><span class="s">'n'</span><span class="p">,</span><span class="s">'p'</span><span class="p">,</span><span class="s">'repsim'</span><span class="p">,</span><span class="s">'onesim'</span><span class="p">))</span> <span class="o">%dopar%</span> <span class="p">{</span>
repsim<span class="p">(</span>nrep<span class="o">=</span>nper<span class="p">[</span>i<span class="p">],</span>zetas<span class="o">=</span>zetas<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">)</span>
<span class="p">}</span> <span class="o">%>%</span>
bind_rows<span class="p">()</span>
<span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
retv <span class="o"><-</span> repsim<span class="p">(</span>nrep<span class="o">=</span>nrep<span class="p">,</span>zetas<span class="o">=</span>zetas<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">)</span>
<span class="p">}</span>
retv
<span class="p">}</span>
<span class="c1"># actually do it many times.</span>
ope <span class="o"><-</span> <span class="m">252</span>
zetasq <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="o">/</span><span class="m">8</span><span class="p">,</span><span class="m">1</span><span class="o">/</span><span class="m">4</span><span class="p">,</span><span class="m">1</span><span class="o">/</span><span class="m">2</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">4</span><span class="p">)</span> <span class="o">/</span> ope
zeta <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>zetasq<span class="p">)</span>
params <span class="o"><-</span> tidyr<span class="o">::</span>crossing<span class="p">(</span>tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>n<span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="m">200</span><span class="p">,</span><span class="m">400</span><span class="p">,</span><span class="m">800</span><span class="p">,</span><span class="m">1600</span><span class="p">,</span><span class="m">3200</span><span class="p">,</span><span class="m">6400</span><span class="p">,</span><span class="m">12800</span><span class="p">),</span>
tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>p<span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">8</span><span class="p">,</span><span class="m">16</span><span class="p">),</span>
tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>kurty<span class="p">,</span><span class="m">1</span><span class="p">))</span>
nrep <span class="o"><-</span> <span class="m">10000</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">2356</span><span class="p">)</span>
<span class="kp">system.time</span><span class="p">({</span>
results <span class="o"><-</span> params <span class="o">%>%</span>
group_by<span class="p">(</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>sims<span class="o">=</span><span class="kt">list</span><span class="p">(</span>manysim<span class="p">(</span>nrep<span class="o">=</span>nrep<span class="p">,</span>zetas<span class="o">=</span>zeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">)))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
tidyr<span class="o">::</span>unnest<span class="p">()</span>
<span class="p">})</span>
</pre></div>
<div class="highlight"><pre><span></span> user system elapsed
52.427 26.368 18.172
</pre></div>
<p>Here I collect the simulations together, computing the three confidence limits and
then the empirical type I rates. I plot them below. </p>
<div class="highlight"><pre><span></span><span class="c1"># the nominal rate:</span>
typeI <span class="o"><-</span> <span class="m">0.05</span>
<span class="c1"># invert the TAS function</span>
anti_tas <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="p">{</span> x <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span><span class="m">1</span> <span class="o">+</span> x<span class="o">^</span><span class="m">2</span><span class="p">)</span> <span class="p">}</span>
<span class="c1"># confidence intervals and coverage:</span>
cires <span class="o"><-</span> results <span class="o">%>%</span>
mutate<span class="p">(</span>kurty<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>bit1 <span class="o">=</span> <span class="p">(</span>kurty<span class="o">*</span>pzeta<span class="o">^</span><span class="m">2</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="m">1</span> <span class="o">-</span> p<span class="p">),</span>
bit2 <span class="o">=</span> <span class="p">(</span><span class="m">3</span> <span class="o">*</span> kurty <span class="o">-</span> <span class="m">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span>pzeta<span class="o">^</span><span class="m">2</span><span class="o">/</span><span class="m">4</span><span class="p">)</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>lam1<span class="o">=</span>pzeta<span class="o">*</span><span class="kp">sqrt</span><span class="p">((</span><span class="m">2+3</span><span class="o">*</span><span class="p">(</span>kurty<span class="m">-1</span><span class="p">))</span><span class="o">/</span><span class="p">(</span><span class="m">4</span><span class="o">*</span>n<span class="p">)),</span>
lamp<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span><span class="m">1</span> <span class="o">+</span> kurty<span class="o">*</span>pzeta<span class="o">^</span><span class="m">2</span><span class="p">)</span><span class="o">/</span><span class="kp">sqrt</span><span class="p">(</span>n<span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>tpart<span class="o">=</span>qt<span class="p">(</span>typeI<span class="p">,</span>df<span class="o">=</span>p<span class="m">-1</span><span class="p">,</span>ncp<span class="o">=</span>szeta<span class="o">/</span>lam1<span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>ci_add <span class="o">=</span> szeta <span class="o">+</span> <span class="p">((</span>bit1 <span class="o">+</span> bit2<span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="m">2</span> <span class="o">*</span> n <span class="o">*</span> pzeta<span class="p">))</span> <span class="o">+</span> qnorm<span class="p">(</span>typeI<span class="p">)</span> <span class="o">*</span> <span class="kp">sqrt</span><span class="p">(</span>bit2<span class="o">/</span>n<span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>ci_div <span class="o">=</span> szeta <span class="o">*</span> <span class="p">(</span><span class="m">1</span> <span class="o">+</span> <span class="p">((</span>bit1 <span class="o">+</span> <span class="m">3</span> <span class="o">*</span> bit2<span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="m">2</span> <span class="o">*</span> n <span class="o">*</span> pzeta <span class="o">*</span> pzeta<span class="p">))</span> <span class="o">+</span> qnorm<span class="p">(</span>typeI<span class="p">)</span> <span class="o">*</span> <span class="kp">sqrt</span><span class="p">(</span>bit2 <span class="o">/</span> <span class="p">(</span>n<span class="o">*</span>pzeta<span class="o">*</span>pzeta<span class="p">))))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>ci_tas <span class="o">=</span> pzeta <span class="o">*</span> anti_tas<span class="p">((</span>lam1 <span class="o">*</span> tpart<span class="p">)</span> <span class="o">/</span> <span class="p">(</span>lamp <span class="o">*</span> <span class="kp">sqrt</span><span class="p">(</span>p<span class="m">-1</span><span class="p">))))</span> <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>type1_add <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>psnr <span class="o"><</span> ci_add<span class="p">),</span>
type1_div <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>psnr <span class="o"><</span> ci_div<span class="p">),</span>
type1_tas <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>psnr <span class="o"><</span> ci_tas<span class="p">))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
mutate<span class="p">(</span>zyr<span class="o">=</span><span class="kp">signif</span><span class="p">(</span>pzeta <span class="o">*</span> <span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">),</span>digits<span class="o">=</span><span class="m">2</span><span class="p">))</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`annualized SNR`</span><span class="o">=</span>zyr<span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="c1"># plot CIs:</span>
<span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
ph <span class="o"><-</span> cires <span class="o">%>%</span>
tidyr<span class="o">::</span>gather<span class="p">(</span>key<span class="o">=</span>type<span class="p">,</span>value<span class="o">=</span>type1<span class="p">,</span>matches<span class="p">(</span><span class="s">'^type1_'</span><span class="p">))</span> <span class="o">%>%</span>
mutate<span class="p">(</span>type<span class="o">=</span>case_when<span class="p">(</span><span class="m">.</span><span class="o">$</span>type<span class="o">==</span><span class="s">'type1_add'</span> <span class="o">~</span> <span class="s">'type I rate, difference form'</span><span class="p">,</span>
<span class="m">.</span><span class="o">$</span>type<span class="o">==</span><span class="s">'type1_div'</span> <span class="o">~</span> <span class="s">'type I rate, ratio form'</span><span class="p">,</span>
<span class="m">.</span><span class="o">$</span>type<span class="o">==</span><span class="s">'type1_tas'</span> <span class="o">~</span> <span class="s">'type I rate, tas form'</span><span class="p">,</span>
<span class="kc">TRUE</span> <span class="o">~</span> <span class="s">'bad code'</span><span class="p">))</span> <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>n<span class="p">,</span>type1<span class="p">,</span>color<span class="o">=</span>type<span class="p">))</span> <span class="o">+</span>
geom_line<span class="p">()</span> <span class="o">+</span> geom_point<span class="p">()</span> <span class="o">+</span>
facet_grid<span class="p">(</span>p <span class="o">~</span> <span class="sb">`annualized SNR`</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>labeller<span class="o">=</span>label_both<span class="p">)</span> <span class="o">+</span>
scale_x_log10<span class="p">()</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept<span class="o">=</span><span class="m">0.05</span><span class="p">,</span>linetype<span class="o">=</span><span class="m">2</span><span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'number of days data'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'empirical type I rates at nominal 0.05 level'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'Theoretical and empirical coverage of 0.05 CIs on SNR of Markowitz Portfolio, using some clairvoyance, normal returns.'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/new_mp_ci_ci_plots-1.png" title="plot of chunk ci_plots" alt="plot of chunk ci_plots" width="900px" height="700px" /></p>
<p>The new confidence limit, plotted in blue and called the "tas form" here, is apparently very optimistic and
much too high.
The empirical rate of type I errors is enormous, sometimes over 90%.
It should be noted that the simulations here use some amount of
'clairvoyance' on <span class="math">\(\zeta\)</span>; use of a sample estimate would further degrade them, but they are already unusable
except for unreasonably large sample sizes. So back to the drawing board.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Markowitz Portfolio Covariance, Elliptical Returns2018-03-12T22:28:31-07:002018-03-12T22:28:31-07:00Steven E. Pavtag:www.gilgamath.com,2018-03-12:/markowitz-cov-elliptical.html<p>In a <a href="bad-cis">previous blog post</a>, I looked at asymptotic confidence
intervals for the Signal to Noise ratio of the (sample) Markowitz
portfolio, finding them to be deficient. (Perhaps they are useful if
one has hundreds of thousands of days of data, but are otherwise
awful.) Those confidence intervals came from revision four of my paper
on the <a href="https://arxiv.org/abs/1312.0557">Asymptotic distribution of the Markowitz Portfolio</a>.
In that same update, I also describe, albeit in an obfuscated form,
the asymptotic distribution of the sample Markowitz portfolio for
elliptical returns. Here I check that finding empirically.
<!-- PELICAN_END_SUMMARY --></p>
<p>Suppose you observe a <span class="math">\(p\)</span> vector of returns drawn from an elliptical
distribution with mean <span class="math">\(\mu\)</span>, covariance <span class="math">\(\Sigma\)</span> and 'kurtosis factor',
<span class="math">\(\kappa\)</span>. Three times the kurtosis factor is the kurtosis of marginals
under this assumed model. It takes value <span class="math">\(1\)</span> for a multivariate normal.
This model of returns is slightly more realistic than multivariate normal,
but does not allow for skewness of asset returns, which seems unrealistic.</p>
<p>Nonetheless, let <span class="math">\(\hat{\nu}\)</span> be the Markowitz portfolio built on a sample
of <span class="math">\(n\)</span> days of independent returns:
</p>
<div class="math">$$
\hat{\nu} = \hat{\Sigma}^{-1} \hat{\mu},
$$</div>
<p>
where <span class="math">\(\hat{\mu}, \hat{\Sigma}\)</span> are the regular 'vanilla' estimates
of mean and covariance. The vector <span class="math">\(\hat{\nu}\)</span> is, in a sense, over-corrected,
and we need to cancel out a square root of <span class="math">\(\Sigma\)</span> (the population value). So
we will consider the distribution of <span class="math">\(Q \Sigma^{\top/2} \hat{\nu}\)</span>, where
<span class="math">\(\Sigma^{\top/2}\)</span> is the upper triangular Cholesky factor of <span class="math">\(\Sigma\)</span>,
and where <span class="math">\(Q\)</span> is an orthogonal matrix (<span class="math">\(Q Q^{\top} = I\)</span>), and where
<span class="math">\(Q\)</span> rotates <span class="math">\(\Sigma^{-1/2}\mu\)</span> onto <span class="math">\(e_1\)</span>, the first basis vector:
</p>
<div class="math">$$
Q \Sigma^{-1/2}\mu = \zeta e_1,
$$</div>
<p>
where <span class="math">\(\zeta\)</span> is the Signal to Noise ratio of the population Markowitz
portfolio: <span class="math">\(\zeta = \sqrt{\mu^{\top}\Sigma^{-1}\mu} = \left\Vert …</span></p><p>In a <a href="bad-cis">previous blog post</a>, I looked at asymptotic confidence
intervals for the Signal to Noise ratio of the (sample) Markowitz
portfolio, finding them to be deficient. (Perhaps they are useful if
one has hundreds of thousands of days of data, but are otherwise
awful.) Those confidence intervals came from revision four of my paper
on the <a href="https://arxiv.org/abs/1312.0557">Asymptotic distribution of the Markowitz Portfolio</a>.
In that same update, I also describe, albeit in an obfuscated form,
the asymptotic distribution of the sample Markowitz portfolio for
elliptical returns. Here I check that finding empirically.
<!-- PELICAN_END_SUMMARY --></p>
<p>Suppose you observe a <span class="math">\(p\)</span> vector of returns drawn from an elliptical
distribution with mean <span class="math">\(\mu\)</span>, covariance <span class="math">\(\Sigma\)</span> and 'kurtosis factor',
<span class="math">\(\kappa\)</span>. Three times the kurtosis factor is the kurtosis of marginals
under this assumed model. It takes value <span class="math">\(1\)</span> for a multivariate normal.
This model of returns is slightly more realistic than multivariate normal,
but does not allow for skewness of asset returns, which seems unrealistic.</p>
<p>Nonetheless, let <span class="math">\(\hat{\nu}\)</span> be the Markowitz portfolio built on a sample
of <span class="math">\(n\)</span> days of independent returns:
</p>
<div class="math">$$
\hat{\nu} = \hat{\Sigma}^{-1} \hat{\mu},
$$</div>
<p>
where <span class="math">\(\hat{\mu}, \hat{\Sigma}\)</span> are the regular 'vanilla' estimates
of mean and covariance. The vector <span class="math">\(\hat{\nu}\)</span> is, in a sense, over-corrected,
and we need to cancel out a square root of <span class="math">\(\Sigma\)</span> (the population value). So
we will consider the distribution of <span class="math">\(Q \Sigma^{\top/2} \hat{\nu}\)</span>, where
<span class="math">\(\Sigma^{\top/2}\)</span> is the upper triangular Cholesky factor of <span class="math">\(\Sigma\)</span>,
and where <span class="math">\(Q\)</span> is an orthogonal matrix (<span class="math">\(Q Q^{\top} = I\)</span>), and where
<span class="math">\(Q\)</span> rotates <span class="math">\(\Sigma^{-1/2}\mu\)</span> onto <span class="math">\(e_1\)</span>, the first basis vector:
</p>
<div class="math">$$
Q \Sigma^{-1/2}\mu = \zeta e_1,
$$</div>
<p>
where <span class="math">\(\zeta\)</span> is the Signal to Noise ratio of the population Markowitz
portfolio: <span class="math">\(\zeta = \sqrt{\mu^{\top}\Sigma^{-1}\mu} = \left\Vert \Sigma^{-1/2}\mu \right\Vert.\)</span>
A <a href="https://arxiv.org/abs/1803.01381">very recent paper</a> on arxiv calls
<span class="math">\(\Sigma^{-1/2}\mu\)</span> the 'Generalized Information Ratio', and I think
it may be productive to analyze this quantity.</p>
<p>Back to our problem, as <span class="math">\(n\)</span> gets large, we expect <span class="math">\(\hat{\Sigma}\)</span> to approach
<span class="math">\(\Sigma\)</span> in which case <span class="math">\(Q \Sigma^{\top/2} \hat{\nu}\)</span> should approach <span class="math">\(\zeta
e_1\)</span>. What I find, by the delta method, is that
</p>
<div class="math">$$
\sqrt{n}\left(Q \Sigma^{\top/2} \hat{\Sigma}^{-1}\hat{\mu} - \zeta e_1\right)
\rightsquigarrow \mathcal{N}\left(0,
\left(1+\kappa\zeta^2\right)I + \left(2\kappa - 1\right)\zeta^2 e_1 e_1^{\top}
\right).
$$</div>
<p>
Note:</p>
<ul>
<li>The true mean return of the sample Markowitz portfolio is equal to
<div class="math">$$
\mu^{\top} \hat{\Sigma}^{-1}\hat{\mu} =
\mu^{\top} \Sigma^{-1} \Sigma^{1/2} Q^{\top} Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu} =
\zeta e_1^{\top} Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu},
$$</div>
that is, all the expected return is due to the first element of <span class="math">\(Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu}\)</span>.
The first element may have non-zero mean, but the remaining elements are
asymptotically zero mean.</li>
<li>The volatility of the sample Markowitz portfolio is equal to
<div class="math">$$
\sqrt{\hat{\mu}^{\top}\hat{\Sigma}^{-1}\Sigma\hat{\Sigma}^{-1}\hat{\mu}} =
\sqrt{\hat{\mu}^{\top}\hat{\Sigma}^{-1}\Sigma^{1/2}Q^{\top} Q \Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu}} =
\left\Vert Q \Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu} \right\Vert.
$$</div>
So the total length of the vector <span class="math">\(Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu}\)</span> gives
the risk of our portfolio. </li>
<li>By means of <span class="math">\(Q\)</span> we have rotated the space such that the errors in the
elements of <span class="math">\(Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu}\)</span> are asymptotically independent
(their covariance is diagonal).</li>
</ul>
<p>I learned the hard way in the previous post that 'asymptotically' can require
very large sample sizes, much bigger than practical. So here I first check
these covariances for reasonable sample sizes. I draw returns from either
multivariate normal, or multivariate <span class="math">\(t\)</span> distribution with degrees of freedom
selected to achieve a fixed value of <span class="math">\(\kappa\)</span>, the kurtosis factor. I perform
simulations with the sample ranging between 100 and 1600 days, with the
Signal Noise ratio of the population Markowitz portfolio ranging from 1/2
to 2 in annualized units, and I test universes of 4 or 16 assets. For each
choice of parameters, I perform 10K simulations. I compute the error
</p>
<div class="math">$$
\sqrt{n}\left(Q \Sigma^{\top/2} \hat{\Sigma}^{-1}\hat{\mu} - \zeta e_1\right)
\operatorname{diag}\left({\left(1+\kappa\zeta^2\right)I + \left(2\kappa - 1\right)\zeta^2 e_1 e_1^{\top}}\right)^{-1/2}.
$$</div>
<p>
I save the first and last element of that vector for each simulation. Then for
a fixed setting of the parameters, I will create a Q-Q plot of the actual
errors against Normal quantiles. We will not test independence of the elements,
but we should get a quick read on whether we have correctly expressed the
mean and covariance of <span class="math">\(Q \Sigma^{\top/2} \hat{\Sigma}^{-1}\hat{\mu}\)</span>, and
what sample size is required to reach 'asymptotically'. The simulations:</p>
<div class="highlight"><pre><span></span><span class="kp">suppressMessages</span><span class="p">({</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tidyr<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tibble<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>mvtnorm<span class="p">)</span>
<span class="c1"># https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html</span>
<span class="kn">library</span><span class="p">(</span>doFuture<span class="p">)</span>
registerDoFuture<span class="p">()</span>
plan<span class="p">(</span>multiprocess<span class="p">)</span>
<span class="p">})</span>
<span class="c1"># one simulation of n periods of data on p assets with true optimal</span>
<span class="c1"># SNR of (the vector of) pzeta</span>
onesim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="m">0.3</span><span class="p">)</span> <span class="p">{</span>
<span class="c1"># create the population</span>
sig <span class="o"><-</span> rWishart<span class="p">(</span><span class="m">1</span><span class="p">,</span>df<span class="o">=</span><span class="m">1000</span><span class="p">,</span>Sigma<span class="o">=</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="kp">beta</span><span class="p">)</span><span class="o">*</span><span class="kp">diag</span><span class="p">(</span>p<span class="p">)</span><span class="o">+</span><span class="kp">beta</span><span class="p">)[,,</span><span class="m">1</span><span class="p">]</span>
<span class="c1">#mu <- rnorm(p)</span>
<span class="c1"># to simplify our lives, force Q to be the identity</span>
hasig <span class="o"><-</span> <span class="kp">chol</span><span class="p">(</span>sig<span class="p">)</span>
<span class="c1"># don't worry, we rescale it later</span>
e1 <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="kp">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span>p<span class="m">-1</span><span class="p">))</span>
mu <span class="o"><-</span> <span class="kp">t</span><span class="p">(</span>hasig<span class="p">)</span> <span class="o">%*%</span> e1
<span class="c1"># true Markowitz portfolio</span>
pwopt <span class="o"><-</span> <span class="kp">solve</span><span class="p">(</span>sig<span class="p">,</span>mu<span class="p">)</span>
<span class="c1"># true optimal squared sharpe</span>
psrsqopt <span class="o"><-</span> <span class="kp">sum</span><span class="p">(</span>pwopt <span class="o">*</span> mu<span class="p">)</span>
psropt <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>psrsqopt<span class="p">)</span>
<span class="c1"># rescale mu to achieve pzeta</span>
rescal <span class="o"><-</span> <span class="p">(</span>pzeta <span class="o">/</span> psropt<span class="p">)</span>
mu <span class="o"><-</span> rescal <span class="o">*</span> mu
pwopt <span class="o"><-</span> rescal <span class="o">*</span> pwopt
psropt <span class="o"><-</span> pzeta
psrsqopt <span class="o"><-</span> pzeta<span class="o">^</span><span class="m">2</span>
<span class="c1"># now sample</span>
<span class="c1"># kurty is the kurtosis factor </span>
<span class="c1"># =1 means normal, uset</span>
<span class="kr">if</span> <span class="p">(</span>kurty<span class="o">==</span><span class="m">1</span><span class="p">)</span> <span class="p">{</span>
X <span class="o"><-</span> rmvnorm<span class="p">(</span>n<span class="p">,</span>mean<span class="o">=</span>mu<span class="p">,</span>sigma<span class="o">=</span>sig<span class="p">)</span>
<span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
df <span class="o"><-</span> <span class="m">4</span> <span class="o">+</span> <span class="p">(</span><span class="m">6</span> <span class="o">/</span> <span class="p">(</span>kurty<span class="m">-1</span><span class="p">))</span>
<span class="c1"># for a t distribution, I have to shift the sigma by df / (df-2)</span>
X <span class="o"><-</span> rmvt<span class="p">(</span>n<span class="p">,</span>delta<span class="o">=</span>mu<span class="p">,</span>sigma<span class="o">=</span><span class="p">((</span>df<span class="m">-2</span><span class="p">)</span><span class="o">/</span>df<span class="p">)</span> <span class="o">*</span> sig<span class="p">,</span>type<span class="o">=</span><span class="s">'shifted'</span><span class="p">,</span>df<span class="o">=</span>df<span class="p">)</span>
<span class="p">}</span>
smu1 <span class="o"><-</span> <span class="kp">colMeans</span><span class="p">(</span>X<span class="p">)</span>
ssig <span class="o"><-</span> <span class="p">((</span>n<span class="m">-1</span><span class="p">)</span><span class="o">/</span>n<span class="p">)</span> <span class="o">*</span> cov<span class="p">(</span>X<span class="p">)</span>
swopt <span class="o"><-</span> <span class="kp">solve</span><span class="p">(</span>ssig<span class="p">,</span>smu1<span class="p">)</span>
<span class="c1"># scale by sigma^T/2</span>
ssmp <span class="o"><-</span> hasig <span class="o">%*%</span> swopt
stat <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>n<span class="p">)</span> <span class="o">*</span> <span class="p">(</span>ssmp <span class="o">-</span> pzeta <span class="o">*</span> e1<span class="p">)</span>
<span class="c1"># the claim is that this is the covariance of that thing:</span>
<span class="c1"># (1 + kurty * psrsqopt) * diag(p) + (2 * kurty - 1) * psrsqopt * outer(e1,e1)</span>
Omegd <span class="o"><-</span> <span class="p">(</span><span class="m">1</span> <span class="o">+</span> kurty <span class="o">*</span> psrsqopt<span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="m">2</span> <span class="o">*</span> kurty <span class="o">-</span> <span class="m">1</span><span class="p">)</span> <span class="o">*</span> psrsqopt <span class="o">*</span> e1
<span class="c1"># divide by the root covariance. it is diagonal</span>
adjstat <span class="o"><-</span> stat <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span>Omegd<span class="p">)</span>
<span class="c1"># pick out just the first and last values</span>
firstv <span class="o"><-</span> adjstat<span class="p">[</span><span class="m">1</span><span class="p">]</span>
lastv <span class="o"><-</span> adjstat<span class="p">[</span>p<span class="p">]</span>
<span class="kp">cbind</span><span class="p">(</span>pzeta<span class="p">,</span>firstv<span class="p">,</span>lastv<span class="p">)</span>
<span class="p">}</span>
<span class="c1"># do that many times.</span>
repsim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>nrep<span class="p">,</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="m">0.3</span><span class="p">)</span> <span class="p">{</span>
foo <span class="o"><-</span> <span class="kp">replicate</span><span class="p">(</span>nrep<span class="p">,</span>onesim<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">,</span><span class="kp">beta</span><span class="p">))</span>
baz <span class="o"><-</span> <span class="kp">aperm</span><span class="p">(</span>foo<span class="p">,</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">3</span><span class="p">,</span><span class="m">2</span><span class="p">))</span>
<span class="kp">dim</span><span class="p">(</span>baz<span class="p">)</span> <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span>nrep <span class="o">*</span> <span class="kp">length</span><span class="p">(</span>pzeta<span class="p">),</span><span class="kp">dim</span><span class="p">(</span>foo<span class="p">)[</span><span class="m">2</span><span class="p">])</span>
<span class="kp">colnames</span><span class="p">(</span>baz<span class="p">)</span> <span class="o"><-</span> <span class="kp">colnames</span><span class="p">(</span>foo<span class="p">)</span>
<span class="kp">invisible</span><span class="p">(</span><span class="kp">as.data.frame</span><span class="p">(</span>baz<span class="p">))</span>
<span class="p">}</span>
manysim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>nrep<span class="p">,</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="m">0.3</span><span class="p">,</span>nnodes<span class="o">=</span><span class="m">6</span><span class="p">)</span> <span class="p">{</span>
<span class="kr">if</span> <span class="p">(</span>nrep <span class="o">></span> <span class="m">2</span><span class="o">*</span>nnodes<span class="p">)</span> <span class="p">{</span>
<span class="c1"># do in parallel.</span>
nper <span class="o"><-</span> <span class="kp">table</span><span class="p">(</span><span class="m">1</span> <span class="o">+</span> <span class="p">((</span><span class="m">0</span><span class="o">:</span><span class="p">(</span>nrep<span class="m">-1</span><span class="p">)</span> <span class="o">%%</span> nnodes<span class="p">)))</span>
retv <span class="o"><-</span> foreach<span class="p">(</span>i<span class="o">=</span><span class="m">1</span><span class="o">:</span>nnodes<span class="p">,</span><span class="m">.</span>export <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">'pzeta'</span><span class="p">,</span><span class="s">'n'</span><span class="p">,</span><span class="s">'p'</span><span class="p">,</span><span class="s">'kurty'</span><span class="p">,</span><span class="s">'beta'</span><span class="p">,</span><span class="s">'repsim'</span><span class="p">,</span><span class="s">'onesim'</span><span class="p">))</span> <span class="o">%dopar%</span> <span class="p">{</span>
repsim<span class="p">(</span>nrep<span class="o">=</span>nper<span class="p">[</span>i<span class="p">],</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="kp">beta</span><span class="p">)</span>
<span class="p">}</span> <span class="o">%>%</span>
bind_rows<span class="p">()</span>
<span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
retv <span class="o"><-</span> repsim<span class="p">(</span>nrep<span class="o">=</span>nrep<span class="p">,</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="kp">beta</span><span class="p">)</span>
<span class="p">}</span>
retv
<span class="p">}</span>
<span class="c1"># actually do it many times.</span>
ope <span class="o"><-</span> <span class="m">252</span>
zetasq <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="o">/</span><span class="m">4</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">4</span><span class="p">)</span> <span class="o">/</span> ope
params <span class="o"><-</span> tidyr<span class="o">::</span>crossing<span class="p">(</span>tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>n<span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="m">400</span><span class="p">,</span><span class="m">1600</span><span class="p">),</span>
tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>kurty<span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">8</span><span class="p">,</span><span class="m">16</span><span class="p">),</span>
tibble<span class="o">::</span>tibble<span class="p">(</span>pzeta<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>zetasq<span class="p">)),</span>
tibble<span class="o">::</span>tribble<span class="p">(</span><span class="o">~</span>p<span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">16</span><span class="p">))</span>
nrep <span class="o"><-</span> <span class="m">10000</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
<span class="kp">system.time</span><span class="p">({</span>
results <span class="o"><-</span> params <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>sims<span class="o">=</span><span class="kt">list</span><span class="p">(</span>manysim<span class="p">(</span>nrep<span class="o">=</span>nrep<span class="p">,</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">)))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
tidyr<span class="o">::</span>unnest<span class="p">()</span>
<span class="p">})</span>
</pre></div>
<div class="highlight"><pre><span></span> user system elapsed
1086.729 25.859 208.513
</pre></div>
<p>We collect the simulations now:</p>
<div class="highlight"><pre><span></span><span class="c1"># summarize the moments:</span>
sumres <span class="o"><-</span> results <span class="o">%>%</span>
dplyr<span class="o">::</span>select<span class="p">(</span><span class="o">-</span>pzeta1<span class="p">)</span> <span class="o">%>%</span>
arrange<span class="p">(</span>firstv<span class="p">)</span> <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>firstq<span class="o">=</span>qnorm<span class="p">(</span>ppoints<span class="p">(</span><span class="kp">length</span><span class="p">(</span>firstv<span class="p">))))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
arrange<span class="p">(</span>lastv<span class="p">)</span> <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>lastq<span class="o">=</span>qnorm<span class="p">(</span>ppoints<span class="p">(</span><span class="kp">length</span><span class="p">(</span>lastv<span class="p">))))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
mutate<span class="p">(</span>zyr<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>pzeta<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`annualized SNR`</span><span class="o">=</span>zyr<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`kurtosis factor`</span><span class="o">=</span>kurty<span class="p">)</span>
</pre></div>
<p>What follows are the Q-Q plots of, first, the first element of the vector
<span class="math">\(Q \Sigma^{\top/2} \hat{\Sigma}^{-1}\hat{\mu}\)</span>, and then the last element.
We have facet columns for <span class="math">\(\zeta\)</span> and <span class="math">\(\kappa\)</span>, and facet rows for
<span class="math">\(p\)</span> and <span class="math">\(n\)</span>. By my eye, these are all fairly encouraging, with near normal
quantiles of the standardized error, except for the <span class="math">\(n=100, p=16\)</span> case.
This suggests that larger sample sizes are required for a larger universe
of assets. Perhaps also there are issues when the kurtosis is very high, as
we see some deviances in the lower right corner of these plots.</p>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
ph <span class="o"><-</span> sumres <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>firstq<span class="p">,</span>firstv<span class="p">))</span> <span class="o">+</span>
geom_point<span class="p">()</span> <span class="o">+</span>
geom_abline<span class="p">(</span>slope<span class="o">=</span><span class="m">1</span><span class="p">,</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="o">+</span>
facet_grid<span class="p">(</span>p <span class="o">+</span> n <span class="o">~</span> <span class="sb">`annualized SNR`</span><span class="o">+</span><span class="sb">`kurtosis factor`</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>labeller<span class="o">=</span>label_both<span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'theoretical quantiles'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'empirical quantiles'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'QQ, first element of the transformed Markowitz portfolio.'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/marko_cov_ellip_firstv_qq_plots-1.png" title="plot of chunk firstv_qq_plots" alt="plot of chunk firstv_qq_plots" width="900px" height="700px" /></p>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
ph <span class="o"><-</span> sumres <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>lastq<span class="p">,</span>lastv<span class="p">))</span> <span class="o">+</span>
geom_point<span class="p">()</span> <span class="o">+</span>
geom_abline<span class="p">(</span>slope<span class="o">=</span><span class="m">1</span><span class="p">,</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="o">+</span>
facet_grid<span class="p">(</span>p <span class="o">+</span> n <span class="o">~</span> <span class="sb">`annualized SNR`</span><span class="o">+</span><span class="sb">`kurtosis factor`</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>labeller<span class="o">=</span>label_both<span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'theoretical quantiles'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'empirical quantiles'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'QQ, last element of the transformed Markowitz portfolio.'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/marko_cov_ellip_lastv_qq_plots-1.png" title="plot of chunk lastv_qq_plots" alt="plot of chunk lastv_qq_plots" width="900px" height="700px" /></p>
<h2>Quantiles of SNR</h2>
<p>Here we are interested in the Signal Noise ratio of the sample Markowitz
portfolio, which takes value
</p>
<div class="math">$$
u = \zeta \frac{e_1^{\top} Q\Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu}}{
\left\Vert Q \Sigma^{\top/2}\hat{\Sigma}^{-1}\hat{\mu} \right\Vert}.
$$</div>
<p>
Asymptotically we can think of this as
</p>
<div class="math">$$
u = \zeta \frac{\zeta + \sigma_1 z_1}{\sqrt{\left(\zeta + \sigma_1 z_1\right)^2
+ \sigma_p^2 \left(z_2^2 + \ldots + z_p^2\right)}},
$$</div>
<p>
where
</p>
<div class="math">$$
\sigma_1 = n^{-1/2}\sqrt{\left(1+\kappa\zeta^2\right) + \left(2\kappa - 1\right)\zeta^2},\quad\mbox{and}\quad
\sigma_p = n^{-1/2}\sqrt{\left(1+\kappa\zeta^2\right)},
$$</div>
<p>
and where the <span class="math">\(z_i\)</span> are independent standard normals.</p>
<p>Now consider the 'TAS' transform, defined as the Tangent of ArcSine, <span class="math">\(f_{TAS}(x) = x / \sqrt{1-x^2}\)</span>. Apply
this transformation to our SNR, with some rescaling
</p>
<div class="math">$$
f_{TAS}\left(\frac{u}{\zeta}\right) =
\frac{\zeta + \sigma_1 z_1}{\sigma_p\sqrt{z_2^2 + \ldots + z_p^2}},
$$</div>
<p>
which looks a lot like a non-central <span class="math">\(t\)</span> random variable, up to scaling.
(I used the same trick in my <a href="https://arxiv.org/abs/1409.5936">paper on portfolio quality bounds</a>.)
So write
</p>
<div class="math">$$
f_{TAS}\left(\frac{u}{\zeta}\right)
= \frac{\sigma_1}{\sigma_p \sqrt{p-1}} \frac{\frac{\zeta}{\sigma_1} + z_1}{\sqrt{z_2^2 + \ldots + z_p^2}/\sqrt{p-1}}
= \frac{\sigma_1}{\sigma_p \sqrt{p-1}} t,
$$</div>
<p>
where <span class="math">\(t\)</span> is a non-central <span class="math">\(t\)</span> random variable with <span class="math">\(p-1\)</span> degrees of freedom
and non-centrality parameter <span class="math">\(\zeta/\sigma_1\)</span>.</p>
<p>Here I test these quantiles briefly. For one setting of <span class="math">\(n, p, \zeta, \kappa\)</span>,
I perform 50000 simulations, then compute theoretical quantiles based on the
non-central <span class="math">\(t\)</span> distribution as above. I then Q-Q plot.</p>
<div class="highlight"><pre><span></span><span class="c1"># simulate the SNR of the sample Markowitz portfolio</span>
<span class="c1"># SNR of (the vector of) pzeta</span>
mp_snr_sim <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">,</span>beta<span class="o">=</span><span class="m">0.3</span><span class="p">)</span> <span class="p">{</span>
<span class="c1"># create the population</span>
sig <span class="o"><-</span> rWishart<span class="p">(</span><span class="m">1</span><span class="p">,</span>df<span class="o">=</span><span class="m">1000</span><span class="p">,</span>Sigma<span class="o">=</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="kp">beta</span><span class="p">)</span><span class="o">*</span><span class="kp">diag</span><span class="p">(</span>p<span class="p">)</span><span class="o">+</span><span class="kp">beta</span><span class="p">)[,,</span><span class="m">1</span><span class="p">]</span>
<span class="c1"># to simplify our lives, force Q to be the identity</span>
hasig <span class="o"><-</span> <span class="kp">chol</span><span class="p">(</span>sig<span class="p">)</span>
<span class="c1"># don't worry, we rescale it later</span>
e1 <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="kp">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span>p<span class="m">-1</span><span class="p">))</span>
mu <span class="o"><-</span> <span class="kp">t</span><span class="p">(</span>hasig<span class="p">)</span> <span class="o">%*%</span> e1
<span class="c1"># true Markowitz portfolio</span>
pwopt <span class="o"><-</span> <span class="kp">solve</span><span class="p">(</span>sig<span class="p">,</span>mu<span class="p">)</span>
<span class="c1"># true optimal squared sharpe</span>
psrsqopt <span class="o"><-</span> <span class="kp">sum</span><span class="p">(</span>pwopt <span class="o">*</span> mu<span class="p">)</span>
psropt <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>psrsqopt<span class="p">)</span>
<span class="c1"># rescale mu to achieve pzeta</span>
rescal <span class="o"><-</span> <span class="p">(</span>pzeta <span class="o">/</span> psropt<span class="p">)</span>
mu <span class="o"><-</span> rescal <span class="o">*</span> mu
pwopt <span class="o"><-</span> rescal <span class="o">*</span> pwopt
psropt <span class="o"><-</span> pzeta
psrsqopt <span class="o"><-</span> pzeta<span class="o">^</span><span class="m">2</span>
<span class="c1"># now sample</span>
<span class="c1"># kurty is the kurtosis factor </span>
<span class="c1"># =1 means normal, uset</span>
<span class="kr">if</span> <span class="p">(</span>kurty<span class="o">==</span><span class="m">1</span><span class="p">)</span> <span class="p">{</span>
X <span class="o"><-</span> rmvnorm<span class="p">(</span>n<span class="p">,</span>mean<span class="o">=</span>mu<span class="p">,</span>sigma<span class="o">=</span>sig<span class="p">)</span>
<span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
df <span class="o"><-</span> <span class="m">4</span> <span class="o">+</span> <span class="p">(</span><span class="m">6</span> <span class="o">/</span> <span class="p">(</span>kurty<span class="m">-1</span><span class="p">))</span>
<span class="c1"># for a t distribution, I have to shift the sigma by df / (df-2)</span>
X <span class="o"><-</span> rmvt<span class="p">(</span>n<span class="p">,</span>delta<span class="o">=</span>mu<span class="p">,</span>sigma<span class="o">=</span><span class="p">((</span>df<span class="m">-2</span><span class="p">)</span><span class="o">/</span>df<span class="p">)</span> <span class="o">*</span> sig<span class="p">,</span>type<span class="o">=</span><span class="s">'shifted'</span><span class="p">,</span>df<span class="o">=</span>df<span class="p">)</span>
<span class="p">}</span>
smu1 <span class="o"><-</span> <span class="kp">colMeans</span><span class="p">(</span>X<span class="p">)</span>
ssig <span class="o"><-</span> <span class="p">((</span>n<span class="m">-1</span><span class="p">)</span><span class="o">/</span>n<span class="p">)</span> <span class="o">*</span> cov<span class="p">(</span>X<span class="p">)</span>
swopt <span class="o"><-</span> <span class="kp">solve</span><span class="p">(</span>ssig<span class="p">,</span>smu1<span class="p">)</span>
<span class="c1"># compute the true SNR:</span>
snr <span class="o"><-</span> <span class="kp">sum</span><span class="p">(</span>mu <span class="o">*</span> swopt<span class="p">)</span> <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span><span class="kp">t</span><span class="p">(</span>swopt<span class="p">)</span> <span class="o">%*%</span> <span class="p">(</span>sig <span class="o">%*%</span> swopt<span class="p">))</span>
<span class="p">}</span>
params <span class="o"><-</span> tibble<span class="p">(</span>pzeta<span class="o">=</span><span class="m">2</span><span class="o">/</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">),</span>n<span class="o">=</span><span class="m">10</span><span class="o">*</span>ope<span class="p">,</span>p<span class="o">=</span><span class="m">6</span><span class="p">,</span>kurty<span class="o">=</span><span class="m">4</span><span class="p">)</span>
ope <span class="o"><-</span> <span class="m">252</span>
nrep <span class="o"><-</span> <span class="m">50000</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
<span class="kp">system.time</span><span class="p">({</span>
results <span class="o"><-</span> params <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>resu<span class="o">=</span><span class="kt">list</span><span class="p">(</span>tibble<span class="p">(</span>rvs<span class="o">=</span><span class="kp">replicate</span><span class="p">(</span>nrep<span class="p">,</span>mp_snr_sim<span class="p">(</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">)))))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
unnest<span class="p">()</span>
<span class="p">})</span>
</pre></div>
<div class="highlight"><pre><span></span> user system elapsed
109.173 0.040 109.235
</pre></div>
<div class="highlight"><pre><span></span><span class="c1"># invert the TAS function</span>
anti_tas <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="p">{</span> x <span class="o">/</span> <span class="kp">sqrt</span><span class="p">(</span><span class="m">1</span> <span class="o">+</span> x<span class="o">^</span><span class="m">2</span><span class="p">)</span> <span class="p">}</span>
<span class="c1"># here's a function which creates the associated quantile from the noncentral t</span>
qsnrs <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">,</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="p">{</span>
e1 <span class="o"><-</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">)</span>
Omegd <span class="o"><-</span> <span class="p">(</span><span class="m">1</span> <span class="o">+</span> kurty <span class="o">*</span> pzeta<span class="o">^</span><span class="m">2</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="m">2</span> <span class="o">*</span> kurty <span class="o">-</span> <span class="m">1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span>pzeta<span class="o">^</span><span class="m">2</span><span class="p">)</span> <span class="o">*</span> e1
sigma_1 <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>Omegd<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="o">/</span> n<span class="p">)</span>
sigma_p <span class="o"><-</span> <span class="kp">sqrt</span><span class="p">(</span>Omegd<span class="p">[</span><span class="m">2</span><span class="p">]</span> <span class="o">/</span> n<span class="p">)</span>
tvals <span class="o"><-</span> qt<span class="p">(</span>x<span class="p">,</span>df<span class="o">=</span>p<span class="m">-1</span><span class="p">,</span>ncp<span class="o">=</span>pzeta <span class="o">/</span> sigma_1<span class="p">)</span>
<span class="c1"># those were t's; bring them back with tas inverse</span>
retv <span class="o"><-</span> pzeta <span class="o">*</span> anti_tas<span class="p">(</span>sigma_1 <span class="o">*</span> tvals <span class="o">/</span> <span class="p">(</span>sigma_p <span class="o">*</span> <span class="kp">sqrt</span><span class="p">(</span>p<span class="m">-1</span><span class="p">)))</span>
<span class="p">}</span>
sumres <span class="o"><-</span> results <span class="o">%>%</span>
arrange<span class="p">(</span>rvs<span class="p">)</span> <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>qvs<span class="o">=</span>qsnrs<span class="p">(</span>ppoints<span class="p">(</span><span class="kp">length</span><span class="p">(</span>rvs<span class="p">)),</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
mutate<span class="p">(</span>zyr<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>pzeta<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`annualized SNR`</span><span class="o">=</span>zyr<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`kurtosis factor`</span><span class="o">=</span>kurty<span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
ph <span class="o"><-</span> sumres <span class="o">%>%</span>
mutate<span class="p">(</span>qvs<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>qvs<span class="p">,</span>rvs<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>rvs<span class="p">)</span> <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>qvs<span class="p">,</span>rvs<span class="p">))</span> <span class="o">+</span>
geom_point<span class="p">()</span> <span class="o">+</span>
geom_abline<span class="p">(</span>slope<span class="o">=</span><span class="m">1</span><span class="p">,</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="o">+</span>
facet_grid<span class="p">(</span>p <span class="o">+</span> n <span class="o">~</span> <span class="sb">`annualized SNR`</span><span class="o">+</span><span class="sb">`kurtosis factor`</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>labeller<span class="o">=</span>label_both<span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'theoretical quantiles, annualized SNR'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'empirical quantiles, annualized SNR'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'QQ plot, SNR of the sample Markowitz portfolio, 10 years data.'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/marko_cov_ellip_snr_qq_plots-1.png" title="plot of chunk snr_qq_plots" alt="plot of chunk snr_qq_plots" width="900px" height="700px" /></p>
<p>This is rather unfortunate, as it suggests there is still a bug in my code,
or in my covariance, or both.</p>
<p><strong>Edit</strong> I did not think to check the simulations above for longer sample
sizes. Indeed, if you assume the portfolio manager has 100 years of daily
data (!), instead of 10 years, as assumed above, the approximate
distribution of the Signal Noise ratio of the Markowitz portfolio given above
is reasonably accurate, as demonstrated below. So this seems to be another
instance of 'asymptotically' requiring an unreasonably large sample size.</p>
<div class="highlight"><pre><span></span><span class="c1"># once again, but for 100 days of data:</span>
params <span class="o"><-</span> tibble<span class="p">(</span>pzeta<span class="o">=</span><span class="m">2</span><span class="o">/</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">),</span>n<span class="o">=</span><span class="m">100</span><span class="o">*</span>ope<span class="p">,</span>p<span class="o">=</span><span class="m">6</span><span class="p">,</span>kurty<span class="o">=</span><span class="m">4</span><span class="p">)</span>
ope <span class="o"><-</span> <span class="m">252</span>
nrep <span class="o"><-</span> <span class="m">50000</span>
<span class="kp">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span>
<span class="kp">system.time</span><span class="p">({</span>
results <span class="o"><-</span> params <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
summarize<span class="p">(</span>resu<span class="o">=</span><span class="kt">list</span><span class="p">(</span>tibble<span class="p">(</span>rvs<span class="o">=</span><span class="kp">replicate</span><span class="p">(</span>nrep<span class="p">,</span>mp_snr_sim<span class="p">(</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">)))))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
unnest<span class="p">()</span>
<span class="p">})</span>
</pre></div>
<div class="highlight"><pre><span></span> user system elapsed
925.532 9.680 935.352
</pre></div>
<div class="highlight"><pre><span></span>sumres <span class="o"><-</span> results <span class="o">%>%</span>
arrange<span class="p">(</span>rvs<span class="p">)</span> <span class="o">%>%</span>
group_by<span class="p">(</span>pzeta<span class="p">,</span>n<span class="p">,</span>p<span class="p">,</span>kurty<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>qvs<span class="o">=</span>qsnrs<span class="p">(</span>ppoints<span class="p">(</span><span class="kp">length</span><span class="p">(</span>rvs<span class="p">)),</span>pzeta<span class="o">=</span>pzeta<span class="p">,</span>n<span class="o">=</span>n<span class="p">,</span>p<span class="o">=</span>p<span class="p">,</span>kurty<span class="o">=</span>kurty<span class="p">))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
mutate<span class="p">(</span>zyr<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>pzeta<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`annualized SNR`</span><span class="o">=</span>zyr<span class="p">)</span> <span class="o">%>%</span>
rename<span class="p">(</span><span class="sb">`kurtosis factor`</span><span class="o">=</span>kurty<span class="p">)</span>
ph <span class="o"><-</span> sumres <span class="o">%>%</span>
mutate<span class="p">(</span>qvs<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>qvs<span class="p">,</span>rvs<span class="o">=</span><span class="kp">sqrt</span><span class="p">(</span>ope<span class="p">)</span><span class="o">*</span>rvs<span class="p">)</span> <span class="o">%>%</span>
ggplot<span class="p">(</span>aes<span class="p">(</span>qvs<span class="p">,</span>rvs<span class="p">))</span> <span class="o">+</span>
geom_point<span class="p">()</span> <span class="o">+</span>
geom_abline<span class="p">(</span>slope<span class="o">=</span><span class="m">1</span><span class="p">,</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="o">+</span>
facet_grid<span class="p">(</span>p <span class="o">+</span> n <span class="o">~</span> <span class="sb">`annualized SNR`</span><span class="o">+</span><span class="sb">`kurtosis factor`</span><span class="p">,</span>scales<span class="o">=</span><span class="s">'free'</span><span class="p">,</span>labeller<span class="o">=</span>label_both<span class="p">)</span> <span class="o">+</span>
labs<span class="p">(</span>x<span class="o">=</span><span class="s">'theoretical quantiles, annualized SNR'</span><span class="p">,</span>
y<span class="o">=</span><span class="s">'empirical quantiles, annualized SNR'</span><span class="p">,</span>
title<span class="o">=</span><span class="s">'QQ plot, SNR of the sample Markowitz portfolio, 100 years data.'</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>ph<span class="p">)</span>
</pre></div>
<p><img src="https://www.gilgamath.com/figure/marko_cov_ellip_snr_qq_plots_100-1.png" title="plot of chunk snr_qq_plots_100" alt="plot of chunk snr_qq_plots_100" width="900px" height="700px" /></p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>