R in Finance 2016

Fri 20 May 2016 by Steven

This is my fifth time at R in Finance. Every year I write a summary of the event, and post it on my company's internal research blog. As I am unemployed this year, you get to suffer through my recap. Some caveats: I am pretty poor at taking notes, and suffered through a migraine for the entire conference this year.

Day One, Morning

Rishi Narang, founder of T2AM, gave the opening keynote on Machine Learning. Having myself spent some 8 years pursuing ML strategies at hedge funds, I was curious if my experience was well calibrated with others'. After an introductory breakdown of ML, investing, and so on, he gave the problems of ML:

It is hard because the signal-noise ratio is very low.
It is a buzzword, and there is terminology creep.
Almost nobody using ML is successful.

Despite these, ML was still deemed to have 'great promise', although there was a great deal of implicit skepticism. So, yes, this is pretty well calibrated with my experience. The recommended fixes were: try novel data sources, and "hold your nose and invest."

Lightning Round

Bob McDonald on derivmkts, a demo package for teaching derivative pricing, but maybe also for really pricing derivatives. Somehow the greeks function wraps function calls. I love that R is so weird it supports functions like this.

Piotr Orlowski on Divergence swap rates: heavy on the math, the takeaway appears to be that you can represent higher order powers of log returns as a portfolio of variance swaps? Or that you can infer higher order moments of implied volatility by a portfolio of swaps? As tends to happen, this is really a 40 minute talk, but crammed into 6 minutes, so check out affineModelR.

Jerzy Pawlowski talked about the HighFreq package: with it, one can use high frequency data to price higher order risk premia (c.f. Amaya et al.), compute the Hurst exponent (a measure of autocorrelation) from OHLC data, observe seasonality (by time of day) of liquidity and volatility, and so on. Again, a 30 minute talk crammed into 6 minutes.

Majeed Simaan on Tracking Error Portfolios (c.f. Roll (1992)). He finds that tracking the index reduces estimation error: assume two agents using historical data, one performing MV optimization, the other tracking an index. He then simulates these to find the bias and variance. I want to follow up on this, as it is relevant to my work on the Cramer Rao portfolio bounds.

Kris Boudt discussed the problem of dividing data into nearly balanced sets. The greedy heuristic is presented as a strawman. His solution is to treat it as a block matrix problem, with a heuristic of selecting blocks based on nearly equal variance, shown to beat the greedy heuristic via simulation.

Break.

Brian Boonstra on derivative pricing. Open source packages handle only basic SDEs, ignoring term structure, and treat calibration as an afterthought. Convertible bonds present a good test case for option pricing, apparently. Complaints with QuantLib: ignore default intensity, does not handle discrete dividends well, features an 80's era convertible bond model, and so on. So the ragtop package. It uses a 2D stochastic process, with Black Scholes and jump to default dimensions, convert to a PDE, solve via finite differences (a grid).

Matthew Ginley on leveraged ETF volatility simulation by nonparametric density estimation. Motivating example was three months of returns of RTY index vs. IWM vs. TNA (a 3x levered ETF). But I think I lost the plot on how to use KDE here. The trading strategy using a basket of options on LETFs was interesting, but likely has very low liquidity and huge bid-ask spreads.

Klaus Spanderen on Calibration of Heston Local SV model. He echoed the line that calibration is hard. The model in question is a mix of Heston and local SV models, created by introduction of a 'leverage' function. A good review of Feynman-Kac vs. Fokker-Planck. The latter is a 'forward equation', starting from a Dirac delta. Then a lot options math programming stuff, then we see that the forward equation beats Monte Carlo for vanilla options (a test case), and some other stuff. So check out the RHestonSLV package.

Lunch. (Same lunch as the last four years, BTW.) (But that's fine by me.)

Day One, Afternoon

Tarek Eldin on Random pricing errors. What is pricing error? First, you have to know what Intrinsic Value and Market Value are. Pricing error is a mismatch between them. Based on a simple example, the cap-weighted portfolio appears to be 'properly priced', while equal weighted portfolios are underpriced. Scaling this toy problem up, where market values are intrinsic values plus uncorrelated errors, it appears that going short cap is profitable. This is the fundamental indexing debate. Drags Stein's estimator into the mix. There are some interesting details here that should fall out, like that the low-vol effect 'should' only work for large cap stocks, for example. It is claimed that random errors are orthogonal to EMH, and can explain cap and vol effects, among others. I am curious how much the effects claimed here are boogered by survivorship bias and other corporate action problems.

c.f. Kross, 1985, claims that size effect is a price effect, Bollen, 2008, looks at vol and cap separately.

Sanjiv Das on measuring liquidity. The key idea is to measure illiquidity by comparing the ETF to its underlying index. Only the ETF is subject to liquidity pricing errors. You can also think of illiquidity as the value of an option to exchange the ETF for the index NAV, so follow the derivative pricing. Illiquidity is then \(\left|\frac{ETF}{NAV} - 1 \right|\). Wow, you can compute that at a single point in time. Weird. It is interesting to think how this talk fits with the previous one: in the case of an ETF, the NAV inarguably is the intrinsic value. BILLIQ, the bond Illiquidity measure is correlated to returns of bond funds, the previous VIX, hedge fund returns, and so on. In all, good stuff.

c.f. Chacko, 2009, and Chacko, Das and Fan, J. Bank. Fin. 2016.

Ryan Hafen. Introduced the Nanex NxCore data, 6 months of 25 ms resolution data. This includes 1.25B records, w/ 47 variables. So how do you make sense of trade data, and lots of it? Try rbokeh, an R interface to bokeh library. Discussed Tessera, a platform for scaling R up to larger scale. It was billed to be flexible and scalable: use any data structure and any R code.

Lightning Round

Nidhi Aggarwal: causal impact of trading: what is the impact of automated trading on liquidity, pricing, and volatility? Studied introduction of AT in 2010. Nice use of googlevis.

Chirag Anand followed up with a talk on liquidity provisioning. The introduction of AT changed the dynamics of trading. The question is whether limit orders are still able to provide liquidity, or whether HFT have taken over this role.

Then the OneTick advertorial.

Patrick Howerter on creating an R database: programming in R vs. programming in SQL? Introduced some ideas for scaling up computations via the ff and ETLUtils packages. Check those out.

Break.

Marc Wildi on mixed frequency indicators. This uses the 'MDFA' filter to measure or estimate macroeconomic health.

Lightning Round

Silie Li, starting with an advertorial for Eagle Alpha. They used gtrendsR to download google trends data on a number of terms, transform and regularize, then apply principal components, and out pops a time series of something like 'sentiment'.

Doug Martin on IR-maximizing fundamental factor models. Something about 'fixing' the Fundamental Law of Active Management by allowing IR to vary over time.

c.f. Ding & Martin (2015).

Robert Franolic, with Eyes on FX. Literally. Visualizing trades of currency pairs, they tend to look like eyes.

Frank Diebold's keynote on estimating global bank network connections. First, what are the ways that institutions could be 'connected'? Perhaps they have correlated Market Risk, Credit Risk, Systemic Risk, Counterparty Risk. Somehow this comes from a variance decomposition matrix. 'Connectedness' are some function of the non-diagonal elements of this matrix. A good high level discussion.

c.f. Demirer et al..

Day Two

Lightning Round

Hsu-Liang Chen has been studying whether technical data on options prices is predictive of future equity price movement. (The theory that informed traders trade in options because of the existence of short costs in equities seems a bit weak, as the conversions market should keep the two in line with each other.) But I will surely look into his paper.

Kyle Balkissoon on a practitioners analysis of the overnight effect. He defines the 'overnight effect' as buy at the close, sell at the open, wherease conversely the 'intraday effect' is to buy at the open, sell at the close. Uses KS test, but the samples are not independent. Ouch. Ouch.

Oddly, Mark Bennett gave two lightning round talks back-to-back. First, Measuring Income Statement Sharpe Ratios using R. First, a capable introduction of the Sharpe ratio, then dive into income statements, which you can load via quantmod, apparently. And buy my book.

Mark Bennett talking for Dirk Hugen on Implementation of Value Strategies using R. You can apparently create UDFs in text and pipe them to postgres via RpostgreSQL. How did I not know about that?

Matt Brigida on Community Finance Teaching Resources with R/Shiny. Apparently the Milken Institute Center for Financial Markets deploys shiny apps to educate legislators about financial issues. But also they make nice teaching tools for students.

Bernhard Pfaff on Portfolio Selection with Multiple Criteria Objectives. The optimization problem becomes, somehow, minimize \(M\) different objective functions subject to some inequality and equality constraints. We seek a Pareto optimal or 'non-dominated' solution. (I am guessing one can get stuck in local optima which are not global optima.) The GA heuristic via mco looks for solutions on the Pareto frontier. OK. The MCDM approach optimizes a positively weighted sum of the objective functions. This requires proper normalization of the objectives. For portfolio selection, consider optimizing risk, return, and transaction costs, for example, or return, vol. and dispersion of risk contributions. He considered the MultiAsset data from the FRAPO package. Some pretty visualizations, including ternary plots (which I still love from my days in material science) from ggtern. I wouldn't trust the backtesting and presented Sharpe ratios, however.

c.f. Hirschberger et al. 2015, Steuer et al. 2005 and 2013, Utz et al. 2015.

Douglas Service on Quantitative Analysis of Dual Moving Average Indicators in Automated Trading Systems. Assume price dynamics follow an SDE, then find the Luxor closed form optimal technical strategy? Wait, what? If there is no autocorrelation, as in the geometric Brownian motion, why should a moving average strategy have any alpha? The issue is that \(\mu\) is of unknown sign and probably small, so you do not know which way the series drifts. Then a blizzard of math happens, and in summary, the Luxor strategy always has non-negative expected return.

Mental note, c.f. Peterson 2015 on formulating trading systems.

Lightning Round

Marjan Wauters: Smart beta and portfolio insurance: A happy marriage? The idea is to buy portfolio insurance to protect smart beta portfolios from short term losses. Apparently CPPI is trend following: buy on wins, sell on losses.

Michael Kapler: Tax Aware Backtest Framework. Taxes are complicated, you are unlikely to find closed form solutions to anything, so use a backtest. Takeaway: taxes matter.

Miller Zijie Zhu: Backtest Graphics, or backtestGraphics. Nice use of shiny for visualizing backtests (or, I suppose, real trading). I can easily imagine there are some hedge funds that really really want a dashboard like this. The package also comes with three backtest datasets, which I am tempted to use in some of my statistical work. In all, a great student presentation.

Laura Vana: Portfolio Optimization Modeling. She presented ROML, an optimization modeling language package. Nice way of easily describing portfolio optimization problems, then shipping them to a solver. This reminds me of some work I did ages ago in my day job wrapping optimizers, which was a miserable slog. Having an AML would have saved me a lot of trouble. Another nice student (?) presentation.

Ilya Kipnis: Hypothesis Driven Development: An Understandable Example. Something about momentum and levering to a target volatility. There was a very suspicious cutoff between 12 and 13 months (presumably this is the window used for the momentum signal).

Break

Mark Seligman: Controlling for Monotonicity in Random Forest Regressors. Seligman is the author of the Rborist package. He introduced cases where monotonicity of some relationship between target and feature is necessary or desired. Classical random forests do not support monotonicity, but Rborist as of 0.1.1 does. Somehow it rejects non-monotonic trees with a fixed probability, not a hardline constraint, via a rejection scheme.

c.f. Wright and Ziegler, 2016, Ishwaran, 2015.

Michael Kane: glmnetlib: A Low-level Library for Regularized Regression. But now known as pirls, an R/C++ package for penalized iteratively-reweighted least squares package. He made a strong defense of linear methods on grounds of efficiency, interpretability, and flexibility. But why rewrite glmnet? It does not support arbitrary family-link combos, can not run out-of-core, and was written in mortran. So he took Bryan Lewis' implementation of glm, wrote it in around 60 lines of R, and compared to glmnet. The code was correct, but too slow, so implement in RCpp and Armadillo. Faster, but not fast enough. Then move to prescreening variables: some variables can be rejected prior to fitting based on a 'safe' and 'strong' condition. This talk was a highlight for me.

c.f. pirls on github.

Xiao Qiao: A Practitioner's Defense of Return Predictability. They used 20 variables from the literature for predicting SPY, price ratios, interest rates, real economic data, technical data, and others. A bunch of market timing strategies were then analyzed, and the winner selected, with a Sharpe of around \(0.8 yr^{-1/2}\) over 2001-2015. Kitchen sink model, using all predictors but with a PCA dimensionality reduction was compared to a Correlation Screening model, which includes variables only when they have an absolute correlation above a threshold. If you want to know more, look up the Hull ETF, HTUS.

Lunch.

Day Two, Afternoon

Patrick Burns: Some Linguistics of Quantitative Finance. A nice talk about language and what we mean when we talk about ideas in Quant finance. First target is 'risk parity', second is 'variance targeting'. Some oblique discussion of these, then segue to agent based modeling (ABMs). After mucking about with a basic model, he introduced the idea of taxing market orders?

c.f. marketAgent package, only from Burns stats.

Eran Raviv: Forecast combinations in R using the ForecastCombinations package. Basic forecast combination works by taking a simple mean. Another method is basic regression of forecasts to target via OLS, or LAD, or a constrained regression (no shorting of forecasts, say), accuracy weighting, winner take all, and so on. These are all (?) supported by the package.

Lightning Round

Kjell Konis: Comparing Fitted Factor Models with the fit.models Package. The package provides a nice way to group together and compare different models on the same data.

Steven Pav: Madness: a package for Multivariate Automatic Differentiation. Not this guy again. blah blah blah derivatives blah blah...

Paul Teetor: Are You Trading Mean Reversion or Oscillation? The classical definition of mean reversion, the O-U process, is perhaps not tradeable, so instead look for 'oscillation'?

Pedro Alexander: Portfolio Selection with Support Vector Regression. The title says it all.

Matthew Dixon: Seasonally-Adjusted Value-at-Risk. This is a topic of interest to me. One approach is a seasonal adjusted conditional variance-covariance matrix.

Break

Bryan Lewis: R in Practice. Bryan presented some work that people are doing to marshall data in a smarter way (avoiding intermediate steps), as well as using those methods to store and share objects. The doRedis package looks like a nice way to elastically scale computations, integrating nicely with EC2.

Matt Dziubinski: Getting the most out of Rcpp: High-Performance C++ in Practice. After a brief history on physical computing trends, talks about estimating the performance of various implementations of algorithms by simulation and timing. He really did have 150 slides for a 20 minute talk, but it was so worth it. One of the takeaways is that memory dominates computation, and aiding branch prediction, using memory wisely, and so on, will speed up tasks. Moreover, the addition of cores has come with a decrease in memory per core, and so so-called concurrency has even worse effects than Ahmdahl's Law suggests, with slowdowns seen for increasing the number of cores.

Lightning Round

Mario Annau: h5 - An Object Oriented Interface to HDF5. Yet another HDF5 package, but the extant ones were all considered lacking, and none were actively maintained. Uses Rcpp to interface the library. This is relevant to my interests, as I used HDF5 for many years in a Matlab/Python environment. Get it on github or CRAN.

Dirk Eddelbuettel: Rblapi Revisited: One Year Later. A progress report on the Bloomberg API package. There has been a lot of progress in the last year, Windows build, pushed to CRAN, more features.

Jason Foster: Multi-Asset Principal Component Regression using RcppParallel. The author of the roll package, which is inarguably a better implementation of rolling computations than what I did with fromo. But, yay, rolling computations!

Qiang Kou: Deep learning in MxNet. MxNet is a system for deep learning using C++ that seems very promising. Install from the drat store, or go to their website.

That's a wrap on another great year. See you all in 2017.

Gilgamath

R in Finance 2016

Day One, Morning

Lightning Round

Day One, Afternoon

Lightning Round

Lightning Round

Day Two

Lightning Round

Lightning Round

Day Two, Afternoon

Lightning Round

Lightning Round

Comments