In God we trust. All others must bring data.
Backtesting is probably the single best method we have to quickly evaluate new trading strategies. However, if used incorrectly it can be our greatest weakness — guiding us on a false path to ruin.
For the uninitiated, backtesting is the process where you simulate a trading strategy with the help of historical data. Effectively you’re seeing what would happen if you went back in time and used your strategy over a certain period. It is primarily used by data scientists and hedge funds and helps to simplify the process of assessing the functionality of a data strategy by testing and rejecting trading strategy ideas.
Imagine you had been given access to a phone payment app’s data. You might hypothesis that this information would allow you to more accurately predict the stock price of retail firm, as you could accurately see their revenue. Backtesting would allow you to formally evaluate your ability to do that, by seeing how you would have faired using the strategy in the past.
As the title suggests, backtesting is not without its shortcomings. Most often, there is a distortion in the strategy between what is simulated and what goes in live trading.
When we are designing our backtest, we try to take multiple things into consideration: mathematics, statistics, psychology, and more. This is more difficult to do than it may seem. In fact, even for experienced data scientists, things can still go very badly wrong when we become exposed to biases in our simulation.
In the rest of this article, we’ve addressed four of the most common backtesting biases that creep in. We’ll look at what causes them and strategies on how you can avoid them.
This bias is also known as the data snooping bias, a name which aptly describes it’s cause. If we believe that the world we live in is probabilistic, then it stands to reason that the set of events that has happened up to now is just one possibility of a multitude of possible outcomes. When we look at historical data we are seeing this single version of the many different things that could have happened. I’ll repeat that one more time as the implications of this are important to understand. Historical data contains information about a single collapsed path of events from a immense number of possibilities.
This means that when we are training a model on a specific dataset, we must be extremely carful not to create a model that exactly describes the data we see. If we add enough variables and features, we will eventually find a model that is significant — but it may not be generalisable. Instead, it is important to create a model that describes a system where the data is a possible version of events.
In practice, data snooping bias occurs when we add too many parameters to our algorithms and fine tune it too closely to the data we have. The end result is that we start not only capturing the underlying system but also start fixing sets of random outcomes. When our model looks at the historical data it will perform well (as it maps fantastically to the events that lead to that specific outcome). However, when faced with new data, it’s performance will collapse as it fails to cope with a different permutation of random events.
There are three steps to avoiding optimisation biases:
- The best way to avoid this bias is to keep your simulation system as simple as possible.
- Keep fewer parameters and simulate your algorithm across diverse markets and time periods.
- Once you’re done with backtesting, it’s recommended that you run the algorithm through new, unfamiliar data to ensure the system’s authenticity and effectiveness.
This bias is a plague to data scientists with little experience of time series forecasting. It is caused when a model is trained with access to future data. This can happen it lots of different ways from obviously feeding future values in the dataset, to complex technical bugs or normalisation procedures.
Look-ahead bias can be difficult to spot at the time, as our human minds are prone to overlooking the fact that a piece of data relies on future information, especially when we ourselves have access to this data. In a simple example, imagine you wanted to create normalised values for a feature X. In a lot of scenarios the procedure for doing this would be to divide every value by the mean value. With time series data this would create a lookahead bias. A mean is created by summing all values between t(0) and t(n) then dividing by n. This means at any time point, except t(n), we’ve used information we wouldn’t have had at that time! This is because the mean contains information about all values in the series. Instead we might have to use some sort of trailing or rolling mean.
When you backtest on the same data set, you are more likely to involuntarily introduce a look-ahead bias into the system. This bias directly influences the results of live trading and it’s important that you avoid it. Two methods for doing so might be:
- To do both live trading and backtesting are done using the same algorithm or code because when the code attempts to look-ahead, the program crashes.
This is another bias that coders or data scientists overlook. It was originally described in the second world war as American scientists looked to improve amour placement on their planes. To see where most needed improvements they recorded where returning planes had been shot and got results like these. Can you work out what they did?
The insight is that by measuring the bullet holes in this manner, they were actually recording where planes had been shot and survived. In effect the key information was the information that wasn’t in their dataset!
It can be made most clear by imagining a short only strategy (where we are trying to make money on companies falling in value). If we you used a stock-database that exists today. We would only consider only the stocks that are available, or alive, at this point of time. What you’re missing out is the stocks that are no longer listed (e.g. because there value fell too much). It’s the same survivorship bias that American scientists saw in the war as the most important data points are the ones excluded from our dataset!
Consider a strategy that looks at stocks in S&P 500 and wants to beat the returns of the index. You’re smart and you want to capture different market regimes, so you get data going back to 1998. As you’re looking to trade on the model, you only get stocks included are only those that are tradeable in the market right now. At this point we’ve introduced survivorship bias to any model we build.
A recent McKinsey report showed that average time a company spends in the S&P 500 is about 18yrs, so we could be missing about 250 companies from out data. Each one of these is a missed revenue opportunity but also a missed opportunity to identify risk. Two examples might be Lehman Brothers and Pets.com, both could be incredibly profitable or incredibly costly — our model wouldn’t have this experience.
Survivorship biases can cause serious consequences in live trading, however, they can be minimized by:
- Purchasing databases that feature delisted stocks as well
- You can also avoid this by adding more recent data to your backtests.
Neglecting Market Impacts
Finally, this last consideration applies more to larger institutional investors than people with small retail books, but it is important be aware of. Put succinctly, the data history that you use to backtest over does not include your trades. When you trade, the market will move and adjust to the changes you make on it and depending on the specifics of the trade, this could quickly erode your profits.
When you trade over the backtest, the simulation does not really forecast the price you’re actually likely to get when you trade for real. Since trading and pricing go hand in hand, neglecting this market impact causes a bias that will influence your backtesting result. A simple fix to this bias:
- Is to always anticipate that when you trade, the prices will be against you. — This presumption, while conservative, eliminates the bias, yielding you more accurate results.
Most importantly, you can avoid fallacies in backtesting by changing the way you think about it. If the purpose of your backtesting is to assess the accuracy or the efficiency of your strategy, are likely to be overconfident and will have at best disappointing results in real trading. Instead, you should start looking at backtesting as a filtration process of eliminating strategies. Be strict in this and you’ll have strategies that are more accurate and negligibly biased. Good luck!