# Broken Backtests

Steven E. Pav
(former quant)

## Who am I?

• Former applied mathematician.
• Quant Programmer & Quant Strategist 2007-2015 at two small ML-based hedge funds.
• Almost pure quant funds, ML-based, in U.S. ("single name") equities and volatility futures.
• Tried many approaches to finding alpha:
• ML based like SVM, random forests, GP.
• traditional techniques: plain old linear regression.
• Terrifying feeling of "what am I doing here?"
• How do you write and validate and debug a backtest simulator?
• No open source code available at the time.
• Most academics probably write backtest code from scratch.
• Papers with implicit backtests are a priori suspect, moreso if the strategy is complex.

## Why backtest?

• What makes a profitable strategy?
• Need prediction of future price movements.
• But also:
• Turn the predictions into trades.
• No, really. You need to turn the predictions into trades.
• Eliminate or reduce exposure to certain risks.
• Control trade costs. (market impact, commissions, short financing.)
• Hard to estimate the effects of the different moving parts separately.
• Backtesting basically implies systematic strategies.
• Backtest to decide how much (if any) to deploy in a strategy.

## What do backtests do?

• A backtest probably should:
• simulate environment in which you act (presents point-in-time data, accepts orders).
• simulate the reactions of the world (fills, commissions, corporate actions, etc.).
• translate in an obvious way to a real trading strategy.
• provide a guarantee of time safety.
• Creating a good backtesting environment requires:
• Software engineering: balance time safety, computational efficiency & developer sanity.
• Domain knowledge and data: How do corporate actions work? How should you simulate fill?
• Great statistical powers: How do you interpret the results? How do you avoid overfitting?
• Good intuition and sleuthing abilities: What new thing is broken?

## Garbatrage

• Use Bayes' Rule:

• Devising a consistently profitable trading strategy is known to be hard.
(The EMH posits that it is essentially impossible.)
• Bugs are easy to make. A good programmer will make several a day.
• If your backtest looks profitable, what is the likelihood the strategy is really profitable? $\mathcal{O}\left(\left.A\right|B\right) \propto \mathcal{O}\left(A\right) \Lambda\left(\left.B\right|A\right).$

• If you are exploring a new asset class, using a new fill simulator, using new code, testing a new strategy, or reading a paper, and the backtest looks great, it's probably a bug.

Examples:

• Paper from March 2012 that claimed Sharpe of 3.5 / sqrt(yr) and 500% annual returns using monthly trading with signal delayed a month.
• Three day old tweets give you a Sharpe of around 9 / sqrt(yr) trading on the DJIA index.

## Time Travel

• The most common error in backtests is time travel: use of future information in simulations.
• Time travel is easy to simulate, but hard to implement!
• Time travel occurs for many reasons:
• Using crude tools.
• Backfill and survivorship bias.
• Representation of corporate actions: dividends, splits, spinoffs, mergers, warrants.
• Think-os and code boo boos.

## Survivorship and Backfill

• Inclusion/exclusion of a company in data may be a form of time travel.
• A classic survivorship bias: trading historically on today's S&P500 universe of stocks.
• Similarly, data vendors often backfill data for companies.
• You can test for this, or just ask them!
• Vendors (or you) do weird things to deal with mergers.
• Takeaway: be careful with universe construction.

## Corporate Actions

• Corporate actions are notoriously time-leaky.
• Representing asset returns as a single time series: in reality, they branch across time.
• Corporate actions are just hard to model.
• For example, (back) adjusted closes. A portfolio inversely proportional to adjusted close has time-travel 'arb'.

## The ML Hacker Trap

• Align returns to features for training ML models.
• Forget that the model is timestamped to the returns.
• A warning: the more often I retrain, the better my model!
(Often with an excuse for 'time freshness'.)

## The ML Hacker Trap

• Align returns to features for training ML models.
• Forget that the model is timestamped to the returns.
• A warning: the more often I retrain, the better my model!
(Often with an excuse for 'time freshness'.)

## Broken Fill Simulation

• In reality, orders might not get (fully) filled, or might get a bad price.
• Hard to simulate given coarse data, like daily bars.
• There is 'market impact' where your order affects your fill price.
• Bigger orders lead to bigger impact.
• Decent theoretical models but with uncertain parameters.
• Fitting the parameters is tricky--you only observe one history.
• Impact models often ignore other factors (like the Market).
• Fill simulation should introduce a large band of uncertainty around your simulations.

## Overfitting

• Two forms of overfitting:
• Having an overly optimistic estimate of out-of-sample performance.
• Choosing a suboptimal strategy by having too much freedom.
• Two forms or one form?

## Biased Estimates

• First kind of overfitting is like 'estimation after selection'.
• For example:
• generate 1000 'random' strategies,
• backtest them all,
• pick best one based on maximal in-sample Sharpe,
• estimate the out-of-sample Sharpe of that strategy?
• But not entirely a technical problem.
• Usually attacked by elaborate 'in-sample' vs. 'out-of-sample' schemes.
• In reality, there is 'in-sample' and 'trading-real-money.'
• You ignore data available to you now at your own risk.

## Suboptimal Models

• The classical overfit problem: too many parameters causes poor live performance.

• Applies to portfolio optimization in a subtle way.

• "A perfectly rational agent should not be harmed by addition of choices."

• There are no perfectly rational systematic strategies.

## Parting Words

• Sorry, but you probably have to write your own backtester.
• Backtests are often broken.
• You should be suspect of all results based on backtests, including your own.
• The best way to find errors in your backtester is to use it a lot.
• Making systematic strategies is not just a software engineering challenge.
• Good luck!