2018 marks the tenth year of R in Finance. Once again, here is my biased and incomplete take on the proceedings.

Day One, Morning Lightning Round

  1. Yu Li started the conference with a lightning talk on whether click and visit data on Morningstar's website was predictive of fund flows, using ordinary linear regression, when the usual dependent variables (beta, momentum, etc.) are taken into account. The early results looked inconclusive. (n.b.: if visits and clicks are dependent on market movement, there will be complicated interactions with momentum that would have to be controlled.) Another takeaway is that your actions on the internet are a potential gold mine (well, data mine) for someone.

  2. Daniel McKellar used a graph theory view of companies to compare geographical effects to sector and industry effects. The target metric is something called 'modularity'. (My fear is that this metric is defined in a trivially gameable way, but this could be due to my general paranoia.) Turns out that country clustering gives higher modularity than sector or industry clustering (yay), but then clustering by country and sector gives lower modularity than just by country (uh, uhoh). There followed a linear regression model of the correlation matrix entries to check for clustering effects. While people are trained to digest linear regression models (and perhaps this is the way to go with upper management), I hope there are more advanced techniques for covariance cluster analysis.

  3. Jonathan Regenstein presented a shiny page that performs Fama French decomposition on a portfolio that the user enters. He bemoaned the weird choice of distribution channel for Ken French's data (zipped CSV with header junk). I have packaged a few of the datasets into a data package for my book. But I have been thinking there should be a canonical data package that contains all the FF data (it only updates once a year, and could be made programmatic.).

Day One, Morning Talks

Kris Boudt gave a talk on a new kind of regularized covariance estimator. The idea is to combine the row subselection of MCD with a kind of shrinkage to deal with the 'fat data' problem (more columns than rows). The motto appears to be 'shrink when needed.' He presented the 'C-step' theorem which underpins their algorithm: suppose you subselect some rows of data, compute sample mean and covariance, then define Mahalanobis distances on that sample mean and covariance. Then if you pick another subsample of the data that has smaller sum of Mahalanobis distances than your original sample, then than new subset will have smaller covariance determinant. The implication then is to just compute a Mahalanobis function, then take the subset of the smallest Mahalanobis distances and iterate. This shows that the algorithm converges, and determinant decreases at each step. (I confirmed with Kris that the objective is not convex, so this method falls into local minima; to find a global minimum, you have to employ some tricks.) He followed up with some examples showing how the algo works; toy data experiments confirm it helps to have clairvoyance on the population outlier rate. His example computing the minimum variance portfolio triggered me, as I am not convinced covariance shrinkage should be used for portfolio construction.

Majeed Simaan gave a talk on 'Rational Explanations for Rule-of-Thumb Practices in Asset Allocation.' He seemed to be looking for conditions under which one would prefer the Global Minimum Variance portfolio, the Mean Variance Portfolio, or the Naive Allocation Portfolio. These are based on the likely estimation error. My interpretation is that if, for example, the true Markowitz portfolio weights (err, MVP in Majeed's terminology) are widely dispersed, you are less likely to make a deleterious estimation error in constructing the sample Markowitz portfolio. These rules of thumb are translated into more easily digestible forms which can be tested. I look forward to the paper.


Norm Matloff gave a keynote on 'Parallel Computation for the Rest of Us'. He notes that there are a number of paradigms for parallelizing computation in R, with varying levels of abstraction and sophistication. However, it is (still) the case that using these packages requires some knowledge of hardware (e.g. caching) and how the computations are parallelized. There is no 'silver bullet' that automatically parallelizes computations. He described the two key design paradigms of the partools package: Leave It There (ie bring computation to the data, and leave the data there), and Software Alchemy (try to automatically convert regular problems into approximately equivalent Embarassingly Parallel problems, and solve those instead).

Day One, Afternoon Talks

Matthew Ginley gave a talk about forecasting rare events under monotonicity constraints. I don't think I can do his technique justice, but he seemed to start with a density estimator, then estimate the proportion of rare events at values of the independent variable (or 'feature') near where the rare events were observed, then he constructed a monotonic regression on those values. My notes say I should look up ROSE (random over sampling examples) and BART (Bayesian Additive Regression Trees), and that the 'usual' metrics you might use to score performance (like MSE, or 0/1 loss) may give counterintuitive results.

Rainer Hirk discussed multivariate ordinal regression models using mvord. He presented some problems around credit rating data from the big three (Moody's, Fitch, S&P): how are the ratings related to each other, how do they change over time, can they be explained by independent variables (features of the rated companies), and so on. Ordinal regression apparently works as a latent variable with some thresholds to determine the classes. The mvord package can handle all the questions he threw at it.

Lightning Round

  1. Wilmer Pineda had some of the prettiest slides I saw all day, but I had a difficult time understanding his talk, I'm afraid.

!. Neil Hwang talked about 'bipartite block models'. The idea seemed that you might have a bipartite graph with edges defining some commonality between nodes, and you want to detect communities among them. (This reminds me of some work I did in my short stint in the film industry on trying to detect similarities between actors and films based on the 'appeared-in' definition of edges.)

  1. Glenn Schultz gave an advertorial for bondlab, which appears to be a bond pricing package of the same name, connected to data from their web site. I think if you work in fixed income, you'll want to take a look at this.

  2. Dirk Hugen gave a talk on using R in Postgres via the PL/R extension. This is a nice trick if you use Postgres: you can basically ship R UDFs into the database and run them there. I am always a fan of having the DB do the work that my desktop cannot.


Michael Gordy, from the Federal Reserve, discussed 'spectral backtests'. Apparently banks produce 1 day ahead forecasts of their profit and loss every night. The banks also track what their actual PnL is (weird, right?), which are then translated into quantiles under their forecasts. The question is whether the forecasts are any good, and whether that can be quantified without dichotomizing the data. (For example, by looking at the proportion of actuals at or above the 0.05 upper tail of loss forecasts.) I didn't follow the transform, but he took an integral with some weighting function, and out popped some hypothesis tests. Go check out the paper, and apparently there is a package coming as well.

Lightning Round

  1. Mario Annau gave a progress report on hdf5r, which provides HDF5 file support for R. HDF5 is still probably the best multi-language high performance data format, and this package was apparently rewritten for performance by cutting out some C++ middleman code. The roadmap for this package includes dplyr support, which would be a welcome feature.

  2. David Smith gave a talk promoting the new Azure backend for the foreach package.

  3. Stephen Bronder gave a progress report on porting Stan to GPUs. Matrix operations like inversion and Cholesky factorization are hard to parallelize, but they are coming to Stan.

  4. Xin Chen presented the glmGammaNet package to perform Elastic Net (L1 and L2 regularized) regression, but with Gamma distributed data, which is appropriate for non-negative errors.

  5. JJ Lay discussed multilevel Monte Carlo simulations for stochastic volatility and interest rate modeling. Apparently he achieved 10 thousand time reduction in runtime (!) from a serial computation by parallelizing in this way.


Michale Kane gave a talk on an analysis of cryptocurrency pairs prices from the Bittrex market. (The first part was a hilarious tour through the shady contraband-for-bitcoin market.) He used SVD to approximate the returns of \(p=290\) returns of currency pairs down to dimension \(d\). He used Frobenius norm of error of this approximation, plus a \(d/\sqrt{p}\) regularization to optimize \(d\), finding that \(d\approx 2.5\) was consistent with his sample, fluctuating perhaps to 4 at times. My interpretation is that people think of cryptocurrencies as Bitcoin and also-rans, although perhaps there is a numeraire effect in there. As Michael put it, despite the variety of coins, their returns are not well differentiated.

William Foote gave a presentation in the form of a shiny dashboard, instead of a slideshow, on the topic of, I think, shipping metals and metals prices. A fair amount of time was spent showing the source code for the shiny page, rather than demo'ing the page. If you are in the business of shipping Copper, Alumnium or Nickel around, you would definitely want a dashboard like this, but it is not clear how to interpret all the plots (contours of correlation in an animation?) or 'drive' actions from the dashboard.

Lightning Round

  1. Justin Shea discussed Hamilton's working paper, "Why you should never use the Hodrick-Prescott filter." (I love unambiguous paper titles!) He implemented Hamilton's suggested replacement for the HP filter for detrending time series, in the neverhpfilter package, which implements this regression, returning a glm object.

  2. Thomas Zakrzewski talked about using a 'Q-Gaussian' distribution (apparently it generalizes Gaussian, \(t\), and bounded Gaussian-like symmetric distributions) in Merton's model for probability of default.

  3. Paul Laux talked about inferring the cost of insuring against small and large market movements from the returns of VIX futures and delta-neutral SPX straddles. He looked at these inferred costs around news announcement dates (jobs reports, FOMC meetings) versus other times, and found that the costs were significantly non-zero around news dates.

  4. Hernando Cortina gave a talk for Just Capital, which is apparently a non-profit, established by Paul Tudor Jones, that analyzes companies based on ESG criteria (Environment, Social, Governance). He created quintile portfolios based on these rankings, and found that, over a 1 year out-of-sample period, the "most socially responsible" quintile outperformed the least responsible. (Years ago I worked at a fund that tried to build a SRI vehicle for a client, without much luck.) He then tried to decompose the 'alpha' in the responsible quantile in terms of the ingredients in their Just companies index.


J. J. Allaire gave a talk on Machine Learning with TensorFlow. TensorFlow is apparently a numerical computing library, which is hardware independent and open source, running on CPU, GPU or TPUs (if you can find one). It defines a data flow process which is executed in C++, and reminds me somewhat of Spark. The Models that are built are language-independent (again, like Spark's MLLib). He then talked about Deep Learning, which, if I understood correctly, is just neural nets with lots of layers ('deep'), like hundreds maybe. These are good for 'perceptual-like' tasks, but maybe not so much other areas (uhh, finance?). Apparently Deep Learning has become more popular now because we now have the computational resources and the massive amounts of data to train such huge neural nets, some of which have millions of coefficients. (I imagine if you could analyze how a human brain recognizes digits, say, it would involve thousands of neurons; encoding them all in ten thousand parameters seems about right.) Deep Learning still has problems: the models are not interpretable, can be fooled by adversarial examples, and require lots of data and computational power.

He introduced Rstudio's tensorflow packages, including the keras package. The package gives you access to a plethora of layer types you might want to put in a Deep Learning model, some appropriate for, say, graphical or image learning, or time series or language processing etc. You do need a fair amount of domain knowledge to create a good collection of layers, and apparently lots of experimentation is required. (I'm predicting that Deep MetaLearning will be the big thing in ten years when we have more data and computational power.) He ran through example uses of TensorFlow from R: classification of images, weather forecasting, fraud detection, etc. The package ecosystem here seems ready for use. For more info, do check out Deep Learning with R, or the more theoretical book, Deep Learning.

Day Two

Lightning Round

  1. Ilya Kipnis started the day by describing some technical-based strategies on VIX ETNs. It's apparently hard to do worse than buy and hold XIV. I talked to Ilya after the conference, and he tells me he has "skin in the game," so this is not just another bunch of quant farts on a blog .

  2. Matt Dancho gave a talk on tibbletime, which provides a time-aware layer over tibble objects, with 'collapse by time' operators (which act like groupings, I think, but are applied in tandem with group_by), a 'rollify' operator which (naively) applies functions at each point in time over a fixed window, time subselection operations and more. He also mentioned flyingFox which uses reticulate to communicate with the Quantopian zipline package. zipline, while rather weak compared to what most quant shops will develop in-house, is the only open source backtesting engine that I know. It is good to see this is coming to R. (I should note this packages seems similar to the tsibble package.)

  3. Carson Sievert talked about dashR, a not-yet-released packag for using dash, which is Python's latest attempt to replicate shiny (pyxley having suffered an early death, apparently). I suppose someone will find this useful, but I was not convinced by Carson's arguments in favor of this approach: easy switching between Python and R, and the ability to quickly import new React components. If the syntax of this framework were much easier to think about than shiny, it would certainly win some converts, but I believe reactive programming is just hard to reason about. At this point many users have learned to embrace the weirdness of shiny and will be unlikely to defect.

  4. Michael Kapler: Interactively Exploring Seasonality Patterns in R rtsviz package. This seems to be a package with a shiny page that can quickly give you a view of the seasonality of your time series data.

  5. Bernhard Pfaff introduced the rbtc package. This wraps the bitcoin API for looking at the blockchain. This is complementary to the rbitcoin and coindeskr packages, which seem to provide pricing information. Expect more from this package in the coming year (perhaps the ability to mine coins, or define your own wallet.)


Eran Raviv gave a talk about combining forecasts using the ForecastComb package. As an example he showed a few different forecast methods applied to a time series of UK electricity supply. The package supports many different methods of combining forecasts: simple averaging; OLS combination (which outperforms simple averaging, but might not be convex in the forecasts, sometimes extrapolating from them); trivial methods like median, trimming, etc. ; accuracy based methods, like inverse Rank, inverse RMSE, Eigenvector approach; regression based methods: OLS, LAD, CLS, subset regressions. There are also summary and plotting functions. If you are combining forecasts, this is the package to use.

Leopoldo Catania motivated the eDMA (efficient dynamic model averaging) package by looking at predicting cryptocurrency returns using some predictive features: technical features on the returns themselves and macroeconomic features. The model under consideration looks like the setup for a Kalman Filter, with a linear model where the coefficients change under an AR(1) model, but instead somehow summarized by a 'forgetting factor'. A consequence is that, somehow, yo have to perform linear regressions on all subsets, using multiple forgetting factors, and maybe evaluate them all on a rolling basis. The good news is that this package is fairly efficient, using Rcpp and RcppArmadillo, and is perhaps 50 times faster than the dma package, but still it takes around an hour to run a regression with 18 features and 500 rows. And the results were hard for me to interpret, and seemed to be worse than the benchmark method under MSE metric. (And the claim that predictability 'increased over time' could possibly be attributed to the longer time series?)


Guanhao Feng gave a talk on "Deep Learning Alpha". As I understand it, the motivation was that there is a veritable "zoo" of factors and factor models (see Harvey & Liu (2016)), but factors are typically defined oddly. That is, most factor returns are defined to be relatively robust to how you would define the purported anomaly ('size', 'momentum', etc.), and are rebalanced annually. The speaker, I think, was looking to use Deep Learning to 'automatically' define factors which would be less subject to our lame human ideas of what factors should look like. (The speaker noted that you cannot use ML 'directly' to forecast cross sectional returns because of imbalanced data and missing values: not all features are defined at all times, not all stocks exist at all times, there are mergers and aquisitions, etc.) I think I missed the part where the model was compared to Fama French 3 or 5 factor models.

Xiao Qiao gave an interesting talk on correlated idiosyncratic volatility shocks. The idea is that idiosyncratic volatility has cross-sectional correlation (called, "TVV" for Time Varying Vol), as well as autocorrelation ("VIN" (not that VIN), for Volatility INnovations.) He built what he called a 'Dynamic Factor Correlation' model, which generalizes Bollerslev's CCC and Engle & Kelly's DECO models. He found that there is a signficant cross-sectional correlation of GARCH residuals (TVV), then built portfolios based on sorts (two sorts, if I recall), and showed that the lowest quintile portfolios outperformed the highest quintile. The interpretation from the speaker was roughly that high VIN securities are a kind of 'insurance' against vol spikes, and high TVV securities payout when vol is high in general. (There was also a "Lake Volbegone" effect, where all the portfolios had above-average excess returns, but Stephen Rush pointed out this was likely due to the difference between simple averaging and value averaging.) My notes tell me to look up Ang et al. (2006) and Herskovic et al. (2014).


Li Deng gave a talk on using AI in finance. Li drove the AI effort at Microsoft before joining Citadel. While he couldn't be terribly specific about what he is doing now, he gave a good overview of the history of AI, including its successes in perceptually tasks. He was also fairly honest about the challenges of using AI in finance: low (I would say, "very low") signal-noise ratio compared to perceptual tasks, nonstationarity and adversarial landscape, and the heterogeneity of big data. My guess is that the first of those is the biggest problem, while the third is an engineering, or model design, challenge.

Lightning Round

  1. Keven Bluteau: gave a talk on sentiment analysis. (I think I approached him after the conference, and after two drinks, and told him I enjoyed his talk about hdf5. Oooops! Sorry! You don't really look like Mario!) This was one of a slew of talks about sentiment, and they also around when my computer decided to remount its filesystem read-only. (ack!)

  2. Samuel Borms talked about the sentometrics for computing and aggregating textual sentiment.

  3. Kyle Balkissoon gave a short talk on using weather data to create weather-based signals on companies (I feel like this idea time traveled from the 60's), as well as building text-based signals on companies. The latter is, as Kyle noted, fairly difficult, (as is any signal construction) unless you can really represent what one company is over time. (In addition to the Ship of Theseus argument, splits and mergers complicate the picture, and they complicate our understanding of textual data about companies. I suspect that everyone at the conference who uses CRISP data just sweeps this under the rug, which would be worth the price of admission.)

  4. Petra Bakosova, from Hull Tactical, gave an impromptu talk on seasonal effects, which includes calendar-based effects (month boundaries, January effect, weekend effect, sell in May), as well as 'announcement' dates (FOMC, and maybe earnings announcements?). Building several seasonal strategies, she found that many had higher Sharpe than Buy and Hold around announcements (this seems odd if they are long only), but has lower overall return because the capital is not deployed at all times. (On the other hand, if the seasonal strategies could 'share' capital with other kinds of strategies, maybe it would all work out.)

  5. Che Guan gave a talk on using Machine Learning for 'digital' (or you might say, 'crypto') currency predictions, using technical factors on the coin returns as well as macroeconomic features. I would like to see his results compared and constrasted with those of Leopoldo Catania, who seemed to target the same application with different methods.

Afternoon Talks

David Ardia gave a talk about sparse forecasting using news-based sentiment. The motivating problem was forecasting economic growth. In Europe, this is apparently done by 'ESI', which is some kind of average of survey responses. Can this be automated, sped up, even improved by text-based forecasts? David pursued a penalized least squares approach. The recipe is: classify texts by topic (economic, labor, government, etc), and choose a subset of topics; using multiple lexicons (lexica?) compute the sentiment of each text at time \(t\); aggregate across topics to get some sentiments; to obtain a bunch of topic-based sentiments; get some time series aggregated values (a little hazy here); take a linear combination to get the best forecasts.

Using Germany as an example, he looked at news from LexisNexis from the mid 90's to 2016, filtered articles by geography, topic, article size, applied bag of words sentiment calculation (I think these are 'bivalent' indicators) using 3 lexicons, collapse by lexicon and then looked at sentiment by time and topic. The takeway was that the sentiment indicator seemed to capture the same dynamics as ESI, but perhaps reacted more quickly to the Great Financial Collapse. He also found that combining the sentiment indicator and ESI improved forecasts.

Dries Cornilly gave a really nice talk on the rTrawl package for modeling High Frequency Financial Time Series. In the setup he presented some stylized facts of high frequency returns data. He plotted the autoregressive coefficient for returns in an AR(1) model at different observation frequencies. It exhibits an odd dip to around -0.2 or so at a period of around 1 second, but is otherwise around zero. Why the dip? He then also plots the variance of returns divided by T versus the observation frequency, which goes from around 0.05 down to zero. Again, why? He outlined some of the approaches to the problem, then described the integer valued Lévy processes and 'trawl' processes. From what I understood, you first generate some some finite set of points in some space, one dimension representing time. Then you imagine sweeping across time and computing the sum of all points within the 'wake' of your sweep. In fact, you don't have to imagine the sweep, he showed animations of the sweep. The Lévy processes have like a constant wake, while the trawl processes are supposed to evoke a fisherman with a finite sized net from which the 'fish' escape. He also described a combination of the Lévy and trawl processes, which is like, uhh, a weird net, I guess. Anyway, the rTrawl package apparently supports computing these things, as well as estimating the parameters from an observed series. The parameters would be, I think, the generating process for the points (err, 'fish'), and maybe the size of the 'net' or something. (I don't really know how we transitioned from SPX trades to fish, but it worked.) The kicker at the end is that the combined trawl processes have closed form AR(1) coefficients and variance, so he showed the plots from the beginning of the talk along with the values from the trawl fit, and they match very well!

Luis Damiano gave a talk on Hierarchical Hidden Markov Models in High-Frequency Stock Markets. I think the idea was to create a Hidden Markov Model on stocks ("bullish", "bearish"), but then have another level of hidden Markov Models on top of that (thus "hierarchical"). He backtested this system on a couple of stocks over a short time period, but the story out of sample seemed inconclusive (in contrast to a 2009 article by Tayal he referenced). As a side note, apparently the github page for this project has some L1 and L2 tick data that you can play along with.


So, this happened:

This tweet went out on the first day of the conference, and there was a pile-on on twitter. (It looks like I picked the right year not to give a talk!). While I have to admit this tweet was very effective at drawing attention to the problem, drawing attention is only a step towards solving a problem. The organizers were able, on short notice, to get a talk from two members of "R Ladies", one who I believe was attending the conference anyway, to talk solutions. They suggested a Code of Conduct for the conference, which makes sense: the conference draws people from different backgrounds and cultures, and it is better to make explicit how they should be expected to behave. Moreover, if it increases attendance and improves audience diversity, I am all for it. The two ladies also made a call to action to the audience for us to proactively seek diversity. This is a much larger conversation that is appropriate for this review (and nobody reads this blog for a conversation), but I do hope to see positive changes in diversity and inclusion at the conference, but also in the industry as a whole.

Lightning Round

  1. Phillip Guerra talked about 'autotrading', which appears to be his terminology for taking a backtest to market. Phillip is an anesthesologist who moonlights in asset management. One reason I love this conference is its big tent approach to speakers, who range from academics to industry to independents.

  2. Bryan Lewis gave a talk about Stat Arb Something Something, based on some offhand comments he made at the conference a few years back about how you could just quickly throw together a stat arb strategy. The idea is to find groups of stocks with cointegration relationships, and trade in expectation of a reversal when they diverge from the relationship. Bryan filled in some of the details to this general sketch. One of the problems is there is a huge number of combinations of assets to check for cointegration, and the classical tests do not scale well (in terms of coverage, I believe) to large numbers of time series. Bryan talked about using spectral clustering on the regularized covariance of returns to get candidate sets of assets, then use a Bayesian approach to cointegration. The reference I am to check is a 2002 paper by Alexander, Giblin and Weddington.


I was starting to think that cryptocurrency talks would outpace total meme count this year, then Stephen Rush killed it for team meme with his talk on Currency Risk and Information Diffusion. The motivating idea for the talk is that Information moves from currency markets to equity markets at different speeds. Can we analyze that speed and figure out why it is faster or slower for some companies. The speaker computed VPIN from second-resolution NYSE TAQ data, then downsampled to daily frequency, used CRISP daily data for around 20 years on about 17K firms, then built a linear model for returns of each firm taking into account some future information. The normalized regression coefficients then give you some idea of the 'price adjustment' which is basically a measure of the inefficiency of each stock. The speaker found that VPIN, Size, Turnover and Analyst's coverage had negative effects on this price adjustment (i.e. are indicative of higher efficiency), while Institutional Ownership has a positive effect (lower efficiency). This latter factor is associated with a significant alpha, on the order of around 6% annualized for the top decile.

Jasen Mackie gave a talk on 'Round Turn Trade Simulation'. This seemed to be related to the idea of random portfolios, but focused on computing e.g. the expected maximum drawdown of a trading strategy by sampling from its 'round turn' trades (that is, positions which are opened then closed, presumably defined in a LIFO sense). Using the blotter object, the speaker extracted some stylized facts of the trading strategy: duration of these trade, ratio of long to short, maybe position sizes , etc. Then random realizations were drawn with similar properties. I guess you can think of this as a kind of bootstrap of the backtest. I suppose the autocorrelation of trades would be much trickier to establish (and I suspect would have a huge influence on maximum drawdown).

Thomas Harte closed the conference with a talk on "Pricing Derivatives When Prices Are Not Observable". This is a bit different than incomplete markets. He built a linear model for private equity returns based on some factors, then used that linear model somehow as a proxy in a Rubinstein-type lattice pricing scheme. From these Thomas was able to price certain options on private equity firms (say, a leveraged buyout fund).

This was another great conference, and I hope to be back next year. If you have anything to add, feel free to comment.