Quantitative Finance

I fumbled my way through half a career as a hacker and 'quant' at two hedge funds. During much of that time I had the recurring feeling that I was woefully underinformed, that there must be some book out there that explained exactly how we were supposed to be doing what we were trying to do. I never found that book. Maybe I will have to write that book. In the meantime, I made what I thought were novel discoveries in the areas of portfolio construction and strategy evaluation:

  • I have studied the statistics of the Sharpe ratio, culminating in the SharpeR package, which supports statistical inference on the Sharpe ratio. There is a vignette with the package which details much of the theory. There was also an action-packed lightning talk at R in Finance 2012. My notes on the Sharpe ratio have snowballed into what I am calling a Short Sharpe Course.

  • Just as the Sharpe Ratio is, up to scaling, the t-statistic, the Sharpe ratio of the Markowitz portfolio is the Hotelling T-squared statistic. When adding linear conditional expectation to deal with garden variety 'non-stationarity', one recovers the Hotelling-Lawley trace, which is used to estimate total effect size in a multivariate multiple linear regression (both independent and dependent variables are vectors). The SharpeR package supports inference on the Signal-Noise ratio of the population Markowitz portfolio.

  • I tried to simplify multivariate estimation procedures by collecting the first and second moments together in a single matrix. What resulted when you inverted this matrix, via the block matrix inversion formula, was the Markowitz portfolio (as well as squared Sharpe and the precision matrix). This was the 'weird trick' I presented at R in Finance 2014, and which has become a snowballing paper on arxiv. The statistical techniques are provided by the MarkowitzR package.

  • While I could perform inference on the true Markowitz portfolio, experience and intuition suggested that there was a fundamental bound on how good a portfolio one could construct. I was on a phone interview when I blurted out that I thought there was a Cramér-Rao bound on Sharpe. My intuition was apparently correct, as this approach gives a bound on expected Sharpe. If I knew much about theoretical statistics, I could probably show that the Markowitz portfolio achieves this bound asymptotically because it is built from MLEs. The Cramér-Rao bound was the subject of my enlightening talk at R in Finance 2015, and a Thalesians talk.

    This result has not been well received: on the one hand, everyone 'knows' that overfitting is a problem in quantitative strategy development, yet nobody seems to think this bound applies to them!

  • In my study of all things Sharpe, I discovered LeCoutre's Lambda Prime distribution for inference using the t-statistic (just the Sharpe ratio up to scaling). I realized this could be expanded to multiple independent observations, and the Upsilon distribution was born.

  • I saw a number of talks on the use of higher order moments in quantitative finance at R in Finance 2014, but did not have the requisite background to digest them. As I was studying the classical approximations for probability distributions, I noticed that Roy's original argument for the 'Safety First' criterion could be expanded to include higher order moments, resulting in the 'Safety Third' paper. Rather than assume agents wish to maximize expected utility (and I realize this is controversial), if they sought to minimize the probability of 'a loss,' however that is defined, you arrive at Safety Third. The cute part of this is that agents could agree on all moments of a returns distribution, but value it differently based on the term of their investment: the long term investor prefers to buy lottery tickets, sold by the short term investor. Oddly, the breakeven occurs at unit skew under the three term approximation!

Probability Distributions

  • I was interested in a distribution I called 'Upsilon'. To estimate its quantiles and distribution, I took up the study of classical estimation procedures, the Gram-Charlier, Edgeworth, and Cornish-Fisher expansions. The Gram-Charlier make the most sense to a former numerical analyst, as they consist of approximating the density by a polynomial basis, then truncating. Oddly, the main families of orthogonal polynomials take, as weighting functions, the densities of the Normal, Gamma, and Beta distributions! My efforts resulted in the PDQutils package, which is still a work in progress.

  • As a showcase for PDQutils, I developed the sadists package, which provides density, distribution, quantile, and random generation (the so-called 'dpqr' functions) for several classes of (somewhat artificial) probability distributions--those that can be easily expressed as weighted products or weighted sums of independent random variables of known distribution.

  • In my work on probability distributions, I was stymied by the fact that the moments of the log non-central chi-square distribution were apparently unknown. (A complicated, and wrong, expression for the first moment was known.) During a particularly boring meeting, I scribbled out a proof for this, essentially on an envelope. One long night of work later, I had a paper on arxiv. This was one of the most satisfying experiences in recent memory, and it proves that daydreaming in boring meetings is productive!


  • My first 'real' job post post-doc was as a research scientist at Nellcor. My task was to figure out how to improve the algorithms used by pulse oximeters (measuring weaker signals more accurately, rejecting noise, reducing false alarms, increasing true alarms, and so on) and other non-invasive monitoring instruments. As part of the design process, we needed to reduce the number of distinct (IR) wavelengths used by an instrument. This is a 'channel selection' (or, more broadly, variable selection) problem. I 'solved' it in the context of PLS by this weird continuous embedding. (The algorithm's namesake was our cat at the time.) The real kooky part was adding to the PLS algorithm the ability to also compute the derivative with respect to channel weighting. (The mildly kooky part was turning variable sparsity into a continuous measure.) This would probably have been better served (like many things) via automatatic matrix derivation. I did much of the real heavy lifting on this algorithm while serving on a jury!

  • Chemometrics is a natural area in which to apply non-negative matrix factorizations, and I spent a good deal of time playing with Lee & Seung's method. The major advantage of this algorithm is that it is so simple to implement: the iterative step is a single line in a language like Matlab or R. I modified the main algorithm to deal with weights, regularization, and so on. At some point I want to write up my notes; for now you can try to decipher the patent.

  • The Beer-Lambert law is rightly used extensively to model absorption of light in diffuse media. It turns out, however, that if the distribution of ideal path lengths follow a sum-stable distribution, you get a modification known as the Kohlraush-Williams-Watts model. Somehow I convinced the government to issue me a patent for this. The interesting part for me was that light absorption was a use for the Laplace transform, which I thought was only used in Sophmore math classes.

Computational Geometry

  • My thesis deals with provably good mesh generation, mostly in two dimensions. Meshing is used in finite element simulations, discretizing continuous domains, e.g. airplane parts, so that physical problems can be studied, e.g. will the airplane fly? A good mesh respects the boundaries of the domain, as well as its local scale. Adding too many elements would slow down the final simulation (which is a large linear algebra problem), while poorly shaped elements (very large obtuse angles in triangles in particular) causes the resultant problem to be ill-scaled, also slowing down iterative solutions. Delaunay triangulations maximize the minimum angle over all triangulations on a given set of points, and thus are a natural choice for meshing. (Bounding the minimum angle from below bounds the maximum angle from above.) My thesis gave guarantees on the minimum angle of an output Delaunay mesh in terms of the input angle.

  • I continued to study mesh generation in my postdoc at UCSD, resulting in very little progress other than an algorithm for meshing curved domains. I had written my research code in SML-NJ during graduate school; At UCSD, I switched to writing in C++ using the CGAL library.

  • While at UCSD, I wrote a text for undergraduate numerical analysis course. At the time I wished I had a literate computing solution for writing books, because I loved the precision of LaTeX, but hated the organization of building and naming figures. If only I had knitr! The book is also on github.


  • During my undergraduate career, I studied Ceramic Engineering Science. This culminated in an embarrassing 'thesis' which described some computer simulations of Germania glasses. The relevant chemical properties to note are that Germanium sits between Silicon and Tin in the periodic table; the former binds to four Oxygen atoms, while the latter binds to six. In a glass, according to these simulations, Germanium tends to split the difference, sometimes binding to four, sometimes to six Oxygens. I will dig this up at some point.