Backtest Overfitting

Backtest overfitting is the selection of strategy parameters or rules that fit historical noise rather than a persistent market edge. It is the dominant reason that retail and even institutional backtests fail to replicate in live trading.

How it happens

Overfitting compounds through three vectors:

Parameter search. Trying many parameter combinations and picking the best. With enough trials, something always looks great in-sample.
Rule iteration. Adding entry/exit conditions until the equity curve smooths out. Each added rule is another implicit parameter.
Data dredging. Running the same backtest on many instruments, time ranges, or market regimes and reporting only what worked.

The probability of finding a high-Sharpe rule by chance grows with the number of trials. Bailey & López de Prado (2014) formalize this as the "deflated Sharpe ratio."

Detection

Walk-forward equity curve worse than in-sample by >50% → likely overfit.
Sensitivity test: perturb each parameter ±20%. Robust strategies degrade smoothly. Overfit strategies fall off a cliff.
Out-of-sample Sharpe << in-sample Sharpe. Standard tell.
Strategy uses an unusual number of conditions for the claimed economic rationale.

Mitigation

Walk-forward optimization — necessary but not sufficient.
Hold out a final test set never seen during research. Test once. If it fails, the strategy fails — do not re-optimize.
Pre-register hypotheses. State the rule and parameters before looking at data.
Keep the search space small. Fewer parameters; clear economic rationale for each.

How it happens

Detection

Mitigation

Related