Sentivue/Research/Methodology

Research

Meta-Labeling: Filtering Primary Signals With a Secondary Model

Meta-labeling is a two-stage modeling pattern: a primary signal generator emits trade ideas, and a secondary model filters which ideas to act on. The technique improves precision at the cost of recall.

Sentivue Capital··7 min read

Meta-labeling, formalized in López de Prado's Advances in Financial Machine Learning, is a two-stage modeling pattern. A primary model generates candidate trade signals. A secondary model — trained on the same data — predicts which of those candidates will be profitable, and acts as a filter.

The result is a system that takes fewer trades but with higher precision per trade.

The mechanic

  1. Primary model generates buy/sell signals. Could be rule-based (channel breakout, moving-average crossover) or model-based.
  2. Label past primary signals with their actual outcomes (1 = profitable, 0 = not).
  3. Train a binary classifier to predict, given the features at signal time, whether a primary signal would have been profitable.
  4. At deploy time, primary fires → meta-model evaluates → trade only if meta-model assigns sufficient probability of profit.

Why the structure earns its keep

  • Reduces the dimensionality of the modeling problem. The primary model deals with "is this a signal" (a directional question). The meta model deals with "is this signal worth taking" (a probabilistic question). Two simpler models often beat one complicated one.
  • Improves precision. The meta model's job is filtering — it doesn't need to find new signals, just to throw away the bad ones from a pre-existing set.
  • Plays well with rule-based primaries. A robust rule-based primary (e.g., a trend-following breakout system) can be wrapped in a meta filter without abandoning the rule-based logic. This is closer to how systematic shops actually deploy ML — augmenting rules rather than replacing them.

Common features for meta-labeling models

  • Volatility state at signal time. Was vol expanding or contracting?
  • Trend regime at signal time. Was the broader trend aligned with the signal direction?
  • Liquidity / spread conditions. Wide spreads predict adverse selection.
  • Macro calendar proximity. Proximity to FOMC, NFP, earnings degrades many strategies.
  • Cross-asset confirmation. Equity-vol confirming equity-direction, etc.

Failure modes

1. Overfitting the meta-model

The meta-model can be trained on the same overfit signals as a single-stage model. If the primary is overfit, the meta-model's labels are noisy, and the meta-model fits the noise. Walk-forward discipline applies just as strongly here.

2. Loss of signal generalization

The meta-model is trained on historical signal outcomes. If the next regime produces signals with different characteristics, the meta-model rejects them — even when they're the signals that would have worked. Meta-labeling sometimes filters out the most valuable trades because they look unusual relative to history.

3. Concept drift

The meta-model's calibration degrades faster than the primary's, because it's modeling a more nuanced relationship. Frequent re-training is mandatory.

When meta-labeling earns the complexity

  • High-precision deployments where the cost of a bad trade is high and the cost of missing a good trade is low. Most institutional risk-bounded books fit this profile.
  • Primary signals with clear interpretation that can be verified independently. Meta-labeling on top of a black-box primary stacks two black boxes.
  • Sufficient sample size for the meta-model. Few hundred primary signals minimum; thousands ideal.

When it doesn't

  • Strategies with infrequent signals. A trend-following program firing 30 trades a year cannot train a meta-model on those 30 trades.
  • Already-clean primaries. If primary signals are already 80%+ profitable, meta-labeling adds friction without much gain.
  • Strategies where missing trades is more painful than taking marginal trades. Some carry strategies fit this — the missed-trade cost dominates.

Practical takeaways

  • Meta-labeling improves precision at the cost of recall. Know which side you need.
  • Validate the meta-model with the same OOS discipline as the primary. Both can overfit.
  • Engineering complexity matters. Two-stage models are harder to debug, harder to monitor, and harder to explain. Earn the complexity.

Related