In a previous blog post, I looked at two asymptotic confidence intervals for the Signal-Noise ratio of the sample Markowitz portfolio, finding that they generally did not give nominal type I rates even for large sample sizes (50 years of daily data). In a followup post, I looked at the covariance of some elements of the Markowitz portfolio, finding that they seemed to be nearly normal for modest sample sizes. However, in that post, I used the 'TAS' transform to a \(t\) variate, and found, again, that large sample sizes were required to pass the eyeball test in a Q-Q plot.

Here I mash up those two ideas to construct another confidence limit for the Signal Noise ratio. So I take the asymptotic covariance in the Markowitz portfolio elements, and use them with the TAS transform to get a confidence limit. (You can get the gory details around equation (52) of version 5 of my paper, which also contains the simulations below.)

Here we examine all three of those confidence limits, finding that none of them achieve near nominal type I rates. Again, I let \(p\) be the number of assets, \(n\) the number of days observed, \(\zeta\) is the population maximal Signal Noise ratio. Here I am observing multivariate normal returns, so the kurtosis factor is not used here. I perform a number of simulations across different values of these parameters, each time performing 10000 simulations, computing the sample Markowitz portfolio, and its Signal Noise ratio.

suppressMessages({
  library(dplyr)
  library(tidyr)
  library(tibble)
  # https://cran.r-project.org/web/packages/doFuture/vignettes/doFuture.html
  library(doFuture)
  registerDoFuture()
  plan(multiprocess)
})

# one simulation of n periods of data on p assets with true optimal
# SNR of (the vector of) pzeta
onesim <- function(pzeta,n,p) {
  pmus <- pzeta / sqrt(p)
  # simulate an X: too slow.
  #X <- matrix(rnorm(n*p,mean=pmus[1],sd=1),ncol=p)
  #smu1 <- colMeans(X)
  #ssig <- ((n-1)/n) * cov(X)
  # this is faster:
  smu1 <- rnorm(p,mean=pmus[1],sd=1 / sqrt(n))
  ssig <- rWishart(1,df=n-1,Sigma=diag(1,ncol=p,nrow=p)) / n  # sic n
  dim(ssig) <- c(p,p)

  smus <- outer(smu1,pmus - pmus[1],FUN='+')
  smps <- solve(ssig,smus)
  szeta <- sqrt(colSums(smus * smps))
  psnr <- pmus * as.numeric(colSums(smps) / sqrt(colSums(smps^2)))
  cbind(pzeta,szeta,psnr)
}
# do that many times.
repsim <- function(nrep,zetas,n,p) {
  foo <- replicate(nrep,onesim(pzeta=zetas,n,p))
  baz <- aperm(foo,c(1,3,2))
  dim(baz) <- c(nrep * length(zetas),dim(foo)[2])
  colnames(baz) <- colnames(foo)
  invisible(as.data.frame(baz))
}
manysim <- function(nrep,zetas,n,p,nnodes=7) {
  if (nrep > 4*nnodes) {
    # do in parallel.
    nper <- table(1 + ((0:(nrep-1) %% nnodes))) 
    retv <- foreach(i=1:nnodes,.export = c('zetas','n','p','repsim','onesim')) %dopar% {
      repsim(nrep=nper[i],zetas=zetas,n=n,p=p)
    } %>%
      bind_rows()
  } else {
    retv <- repsim(nrep=nrep,zetas=zetas,n=n,p=p)
  }
  retv
}

# actually do it many times.
ope <- 252
zetasq <- c(1/8,1/4,1/2,1,2,4) / ope
zeta <- sqrt(zetasq)

params <- tidyr::crossing(tibble::tribble(~n,100,200,400,800,1600,3200,6400,12800),
                          tibble::tribble(~p,2,4,8,16),
                          tibble::tribble(~kurty,1))

nrep <- 10000

set.seed(2356)
system.time({
results <- params %>%
  group_by(n,p,kurty) %>%
    summarize(sims=list(manysim(nrep=nrep,zetas=zeta,n=n,p=p))) %>%
  ungroup() %>%
  tidyr::unnest()
})
   user  system elapsed 
352.070 530.625 117.240 

Here I collect the simulations together, computing the three confidence limits and then the empirical type I rates. I plot them below.

# the nominal rate:
typeI <- 0.05
# invert the TAS function
anti_tas <- function(x) { x / sqrt(1 + x^2) }

# confidence intervals and coverage:
cires <- results %>%
  mutate(kurty=1) %>%
  mutate(bit1 = (kurty*pzeta^2 + 1) * (1 - p),
         bit2 = (3 * kurty - 1) * (pzeta^2/4) + 1) %>%   
    mutate(lam1=pzeta*sqrt((2+3*(kurty-1))/(4*n)),
                 lamp=sqrt(1 + kurty*pzeta^2)/sqrt(n)) %>%
    mutate(tpart=qt(typeI,df=p-1,ncp=szeta/lam1)) %>%
  mutate(ci_add = szeta + ((bit1 + bit2) / (2 * n * pzeta)) + qnorm(typeI) * sqrt(bit2/n)) %>%
  mutate(ci_div = szeta * (1 + ((bit1 + 3 * bit2) / (2 * n * pzeta * pzeta)) + qnorm(typeI) * sqrt(bit2 / (n*pzeta*pzeta)))) %>%
  mutate(ci_tas = pzeta * anti_tas((lam1 * tpart) / (lamp * sqrt(p-1)))) %>%
  group_by(pzeta,n,p,kurty) %>%
    summarize(type1_add = mean(psnr < ci_add),
              type1_div = mean(psnr < ci_div),
              type1_tas = mean(psnr < ci_tas)) %>%
  ungroup() %>%
  mutate(zyr=signif(pzeta * sqrt(ope),digits=2)) %>%
  rename(`annualized SNR`=zyr) 
# plot CIs:
library(ggplot2)
ph <- cires %>%
  tidyr::gather(key=type,value=type1,matches('^type1_')) %>%
    mutate(type=case_when(.$type=='type1_add' ~ 'type I rate, difference form',
                                                .$type=='type1_div' ~ 'type I rate, ratio form',
                                                .$type=='type1_tas' ~ 'type I rate, tas form',
                                                TRUE ~ 'bad code')) %>%
  ggplot(aes(n,type1,color=type)) +
  geom_line() + geom_point() + 
  facet_grid(p ~ `annualized SNR`,scales='free',labeller=label_both) + 
  scale_x_log10() +
  geom_hline(yintercept=0.05,linetype=2) + 
  labs(x='number of days data',
       y='empirical type I rates at nominal 0.05 level',
       title='Theoretical and empirical coverage of 0.05 CIs on SNR of Markowitz Portfolio, using some clairvoyance, normal returns.')
print(ph)

plot of chunk ci_plots

The new confidence limit, plotted in blue and called the "tas form" here, is apparently very optimistic and much too high. The empirical rate of type I errors is enormous, sometimes over 90%. It should be noted that the simulations here use some amount of 'clairvoyance' on \(\zeta\); use of a sample estimate would further degrade them, but they are already unusable except for unreasonably large sample sizes. So back to the drawing board.