In a previous blog post, I looked at two asymptotic confidence intervals for the Signal-Noise ratio of the sample Markowitz portfolio, finding that they generally did not give nominal type I rates even for large sample sizes (50 years of daily data). In a followup post, I looked at the covariance of some elements of the Markowitz portfolio, finding that they seemed to be nearly normal for modest sample sizes. However, in that post, I used the 'TAS' transform to a $$t$$ variate, and found, again, that large sample sizes were required to pass the eyeball test in a Q-Q plot.

Here I mash up those two ideas to construct another confidence limit for the Signal Noise ratio. So I take the asymptotic covariance in the Markowitz portfolio elements, and use them with the TAS transform to get a confidence limit. (You can get the gory details around equation (52) of version 5 of my paper, which also contains the simulations below.)

Here we examine all three of those confidence limits, finding that none of them achieve near nominal type I rates. Again, I let $$p$$ be the number of assets, $$n$$ the number of days observed, $$\zeta$$ is the population maximal Signal Noise ratio. Here I am observing multivariate normal returns, so the kurtosis factor is not used here. I perform a number of simulations across different values of these parameters, each time performing 10000 simulations, computing the sample Markowitz portfolio, and its Signal Noise ratio.

suppressMessages({
library(dplyr)
library(tidyr)
library(tibble)
# https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html
library(doFuture)
registerDoFuture()
plan(multiprocess)
})

# one simulation of n periods of data on p assets with true optimal
# SNR of (the vector of) pzeta
onesim <- function(pzeta,n,p) {
pmus <- pzeta / sqrt(p)
# simulate an X: too slow.
#X <- matrix(rnorm(n*p,mean=pmus[1],sd=1),ncol=p)
#smu1 <- colMeans(X)
#ssig <- ((n-1)/n) * cov(X)
# this is faster:
smu1 <- rnorm(p,mean=pmus[1],sd=1 / sqrt(n))
ssig <- rWishart(1,df=n-1,Sigma=diag(1,ncol=p,nrow=p)) / n  # sic n
dim(ssig) <- c(p,p)

smus <- outer(smu1,pmus - pmus[1],FUN='+')
smps <- solve(ssig,smus)
szeta <- sqrt(colSums(smus * smps))
psnr <- pmus * as.numeric(colSums(smps) / sqrt(colSums(smps^2)))
cbind(pzeta,szeta,psnr)
}
# do that many times.
repsim <- function(nrep,zetas,n,p) {
foo <- replicate(nrep,onesim(pzeta=zetas,n,p))
baz <- aperm(foo,c(1,3,2))
dim(baz) <- c(nrep * length(zetas),dim(foo)[2])
colnames(baz) <- colnames(foo)
invisible(as.data.frame(baz))
}
manysim <- function(nrep,zetas,n,p,nnodes=7) {
if (nrep > 4*nnodes) {
# do in parallel.
nper <- table(1 + ((0:(nrep-1) %% nnodes)))
retv <- foreach(i=1:nnodes,.export = c('zetas','n','p','repsim','onesim')) %dopar% {
repsim(nrep=nper[i],zetas=zetas,n=n,p=p)
} %>%
bind_rows()
} else {
retv <- repsim(nrep=nrep,zetas=zetas,n=n,p=p)
}
retv
}

# actually do it many times.
ope <- 252
zetasq <- c(1/8,1/4,1/2,1,2,4) / ope
zeta <- sqrt(zetasq)

params <- tidyr::crossing(tibble::tribble(~n,100,200,400,800,1600,3200,6400,12800),
tibble::tribble(~p,2,4,8,16),
tibble::tribble(~kurty,1))

nrep <- 10000

set.seed(2356)
system.time({
results <- params %>%
group_by(n,p,kurty) %>%
summarize(sims=list(manysim(nrep=nrep,zetas=zeta,n=n,p=p))) %>%
ungroup() %>%
tidyr::unnest()
})

   user  system elapsed
352.070 530.625 117.240


Here I collect the simulations together, computing the three confidence limits and then the empirical type I rates. I plot them below.

# the nominal rate:
typeI <- 0.05
# invert the TAS function
anti_tas <- function(x) { x / sqrt(1 + x^2) }

# confidence intervals and coverage:
cires <- results %>%
mutate(kurty=1) %>%
mutate(bit1 = (kurty*pzeta^2 + 1) * (1 - p),
bit2 = (3 * kurty - 1) * (pzeta^2/4) + 1) %>%
mutate(lam1=pzeta*sqrt((2+3*(kurty-1))/(4*n)),
lamp=sqrt(1 + kurty*pzeta^2)/sqrt(n)) %>%
mutate(tpart=qt(typeI,df=p-1,ncp=szeta/lam1)) %>%
mutate(ci_add = szeta + ((bit1 + bit2) / (2 * n * pzeta)) + qnorm(typeI) * sqrt(bit2/n)) %>%
mutate(ci_div = szeta * (1 + ((bit1 + 3 * bit2) / (2 * n * pzeta * pzeta)) + qnorm(typeI) * sqrt(bit2 / (n*pzeta*pzeta)))) %>%
mutate(ci_tas = pzeta * anti_tas((lam1 * tpart) / (lamp * sqrt(p-1)))) %>%
group_by(pzeta,n,p,kurty) %>%
type1_div = mean(psnr < ci_div),
type1_tas = mean(psnr < ci_tas)) %>%
ungroup() %>%
mutate(zyr=signif(pzeta * sqrt(ope),digits=2)) %>%
rename(annualized SNR=zyr)

# plot CIs:
library(ggplot2)
ph <- cires %>%
tidyr::gather(key=type,value=type1,matches('^type1_')) %>%
mutate(type=case_when(.$type=='type1_add' ~ 'type I rate, difference form', .$type=='type1_div' ~ 'type I rate, ratio form',
.\$type=='type1_tas' ~ 'type I rate, tas form',
facet_grid(p ~ annualized SNR,scales='free',labeller=label_both) +

The new confidence limit, plotted in blue and called the "tas form" here, is apparently very optimistic and much too high. The empirical rate of type I errors is enormous, sometimes over 90%. It should be noted that the simulations here use some amount of 'clairvoyance' on $$\zeta$$; use of a sample estimate would further degrade them, but they are already unusable except for unreasonably large sample sizes. So back to the drawing board.