Slippage Modeling: The Difference Between Paper and Live P&L

"My backtest shows a Sharpe of 1.8. Live performance has been Sharpe 0.3. What changed?"

Most of the time, nothing changed. The backtest didn't model slippage realistically. The "1.8 Sharpe" was a paper number; the live "0.3" is closer to the actual edge minus actual costs.

Modeling slippage well doesn't make a bad strategy good. It does prevent you from deploying a strategy that looks good only because you underestimated costs.

Where slippage comes from

Spread. The bid-ask gap. You cross half (or all) of it on every trade. For liquid equity-index futures: 1 tick. For thinly traded micro-caps: dozens of ticks.
Market impact. Your order moves the price. The larger your order relative to ADV, the more it moves.
Latency. The time between signal generation and order arrival. In high-frequency strategies, the price has often moved away from the signal level by the time the order is in the book.
Adverse selection. Counterparties who only fill your quote when the price is moving against you.
Liquidity gaps. News prints, opens, closes — moments when liquidity vanishes and remaining quotes are wide.

Three levels of realism

Level 1: Constant slippage

A fixed cost per trade, in basis points or per-share/contract. Acceptable for low-frequency, large-cap strategies where the spread dominates and market impact is negligible.

slippage_bps = 5  # for liquid US large-cap equity, 5bp per round trip

Level 2: Volume-relative slippage

Slippage scales with order size relative to typical volume.

slippage_bps = base_spread + k × (order_size / ADV)^α

α is typically 0.5 (square-root market impact, well-documented in the empirical literature). k is a calibration constant per instrument or per liquidity tier.

This is the right level for daily-to-weekly horizon strategies trading reasonable size.

Level 3: Microstructure-aware

Models the order book directly: queue position, cancel/refill dynamics, hidden liquidity, exchange-specific behaviors. Required for higher-frequency or market-making strategies. Almost never appropriate for retail systematic strategies, which generally operate at daily-or-slower horizons.

The 2× rule

Whatever slippage estimate you use — calibrated, naive, or somewhere between — deploy with 2× that estimate in the backtest evaluation. If a strategy's edge survives 2× realistic slippage, it has margin for execution surprises. If it doesn't, it doesn't deserve live capital.

This rule is the closest thing to free protection in systematic research. The strategies it filters out are the strategies that would have failed live anyway.

Where backtests systematically lie

Mid-price fills. The most common backtest error. Filling at the mid systematically halves slippage. Use bid for sells, ask for buys — or model the spread explicitly.
Stop-loss assumptions. Backtests fill stops at the trigger price. Live fills land where they land — often meaningfully worse, especially in fast moves and gaps.
News-event fills. Liquidity vanishes during major prints. Backtests on mid prices around FOMC are wildly optimistic.
Limit-order fills. Did your limit actually fill, or were you queue-lower than you thought? Backtests usually assume fills if the price touched; reality is harsher.

Practical takeaways

Always model spread explicitly. Mid-price fills are the largest single source of paper-to-live divergence.
Use 2× your realistic slippage estimate as the deployment threshold.
Stops fill in the direction of the move that triggered them. Model that asymmetry directly.
The shorter the holding period, the more slippage matters. A daily strategy can absorb the costs; a 30-minute strategy probably can't.