From Backtest to Live: The Institutional Deployment Checklist

The gap between "backtest looks good" and "strategy survives 90 days live" is where most retail systematic strategies die. The deaths are predictable; institutional research stacks have a checklist.

This is ours.

Phase 1: Validation gate

A strategy must pass all of these before it gets to operations:

WFO Sharpe ≥ 1.0 with anchored and rolling configurations both passing.
Held-out test set Sharpe within one standard error of WFO Sharpe. Strategy rejected if not.
Maximum drawdown depth and duration within budget — typically MDD ≤ 15%, recovery ≤ 12 months.
Sensitivity test passes. Parameters perturbed ±20%; strategy degrades smoothly, not catastrophically.
Statistical significance — minimum 250 independent return observations or 100 independent trades, whichever produces tighter SE on Sharpe. See statistical significance in trading.
Edge survives 2× realistic slippage. If not, no deployment.
Defensible economic mechanism. A clear, articulated reason the strategy works that does not depend on backtested specifics.

Phase 2: Operational readiness

Before live capital:

Production-grade execution code. Not the research notebook. Clean separation of signal generation, risk overlay, and execution.
Monitoring dashboards. Realized vs expected: Sharpe, drawdown, hit rate, average winner / loser. Drift alarms on each.
Kill switches at three levels: strategy (own metrics), portfolio (aggregate drawdown), platform (exchange / broker connectivity).
Reconnect logic. What happens if the broker connection drops mid-position? If the data feed lags? If the order doesn't ack? Each scripted, tested, and rehearsed.
Position reconciliation. Daily reconciliation between strategy state and broker state. Mismatches trigger automatic flatten.
Risk constraints. Per-position size cap, per-instrument cap, total gross/net exposure caps. Hard-coded in execution layer, not strategy layer.

Phase 3: Forward paper-trade (1–3 months)

Deploy at zero risk. Compare paper P&L to expected P&L:

Backtest replay reconciliation. Re-run the backtest over the paper-trade period using only data available at decision time. Compare to actual paper-trade signals. They must match.
Slippage tracking. Realized fills vs assumed fills. Document the gap.
Latency profile. Time from signal to order, order to fill. Compared to backtest assumptions.

If anything materially diverges from expectation, return to Phase 1.

Phase 4: Conservative live deployment

Live capital at a fraction of intended size. Common starting points:

10–25% of intended capital allocation for the first month.
Daily review of realized vs expected metrics.
No parameter changes during the conservative period. If the strategy doesn't work as configured, return to research; don't tune in production.

Phase 5: Scale-up

After 30–90 days of conservative deployment with metrics within expectation:

Scale linearly to target allocation over 30–60 days.
Continue daily monitoring with automated drift alarms.
Re-validate quarterly. Re-run WFO on now-extended data; compare to live performance.

What gets a strategy retired

Drawdown beyond planned worst case. Pre-committed kill threshold; not negotiable in the moment.
Realized Sharpe materially below paper-trade and backtest Sharpe for 3+ months.
Strategy mechanism appears broken in current regime. Better to retire than to keep paying for a broken edge.
Capacity reached. Strategy works at $X but degrades meaningfully at $2X. Cap allocation; don't push.

What this checklist isn't

It's not optional. Skipping any phase is the path to the failure modes the checklist exists to prevent.
It's not magic. Edge that isn't real won't survive any of the checklist phases. The checklist is a filter, not a creator.
It's not closed. Add new gates as your operational experience reveals new failure modes.

Practical takeaways

A backtest is the start of validation, not the end.
Operational readiness is half the job. Many "strategy failures" are operations failures.
Pre-commit to deployment thresholds before deploying. In-the-moment decisions about whether to deploy a marginal strategy are how marginal strategies become deployed.