From Walk-Forward Training to Production: V2's Journey to 59% WR on 498 Real BUY Trades

There is no shortage of crypto ML content promising 80% win rates from elaborate backtests. The gap between backtest and live production is where most of these claims die. This is a post about what actually happened when we took our V2 ensemble from walk-forward training to real orders with real money.

V2 went live on February 18, 2026. As of today (April 4, 2026) it has executed 973 real trades on Binance Futures across 10 simultaneous open positions, refreshed every 10 minutes. Here are the numbers that matter, and the decisions we made along the way.

The Production Snapshot

| Metric | BUY | SELL | Overall | |--------|-----|------|---------| | Trades | 498 | 475 | 973 | | Win rate | 59.0% | 50.3% | 54.6% | | Avg PnL per trade | +0.319% | -0.041% | +0.145% |

No survivorship bias. No cherry-picked date ranges. Every signal that met our gate (raw sigmoid >= 0.80 + EV > 0 for SELL) executed, with 10 max simultaneous orders cap.

The story here is a two-sided model with a strong BUY side and a stuck SELL side. We have kept SELL live because cutting it would halve our signal volume, and the EV filter keeps the worst losers out. Below is how we got here.

Phase 1: Walk-Forward Training

Our V2 training runs four walk-forward folds across 60 days of data. Each fold trains on an expanding window, tests on the next 5-day block. We run three gradient boosters (LightGBM, XGBoost, CatBoost) on each fold.

A typical training output looks like this:

Fold 1 (Feb 10-15):
  BUY  lgbm 0.583 | xgb 0.586 | catboost 0.591
  SELL lgbm 0.556 | xgb 0.557 | catboost 0.555

Fold 2 (Feb 15-20): [weak regime]
  BUY  lgbm 0.457 | xgb 0.476 | catboost 0.465
  SELL lgbm 0.435 | xgb 0.454 | catboost 0.441

Fold 3 (Feb 20-25):
  BUY  lgbm 0.614 | xgb 0.622 | catboost 0.611
  SELL lgbm 0.578 | xgb 0.582 | catboost 0.571

Fold 4 (Feb 25-Mar 2):
  BUY  lgbm 0.635 | xgb 0.641 | catboost 0.627
  SELL lgbm 0.572 | xgb 0.576 | catboost 0.568

Combined OOS BUY:  0.572 | 0.581 | 0.574
Combined OOS SELL: 0.535 | 0.542 | 0.534

Fold 2 is terrible. This is a feature, not a bug. Crypto goes through regimes the model cannot predict (news shocks, Fed announcements, liquidation cascades), and a walk-forward framework shows you that explicitly. If every fold were 0.65+, we would be suspicious of data leakage.

Deploy decision rule we landed on: if fold 4 (most recent) BUY AUC is above 0.60 and combined BUY OOS is above 0.55, we deploy. Weak middle folds do not block.

Phase 2: The Feb 18 Go-Live

V2 replaced V1 on February 18. V1 was a single LightGBM classifier whose signal had completely decayed by mid-February (paper trading showed no edge at any threshold). V2 shipped with raw sigmoid >= 0.80 for both sides, no EV filter yet.

First 9 days (Feb 18-Feb 27):

BUY: ~62% WR on 140 trades — strong start
SELL: ~48% WR on 130 trades — bleeding

The SELL losses were concentrated in narrow ranges. A SELL signal would fire, price would chop sideways, then stop out. Meanwhile BUY signals fired mostly in established uptrends and hit their +2% targets.

Phase 3: The EV > 0 Filter (Feb 27)

We trained a companion regressor alongside each classifier. The regressor predicts expected return (as a percentage) for each candidate signal. We began storing it with every prediction from day one but did not use it for gating until Feb 27.

Analysis on ~260 trades showed:

All SELL signals:           48.4% WR
SELL with regressor EV > 0: 54.1% WR  (+5.7pp lift)
SELL with regressor EV < 0: 42.9% WR

All BUY signals:            62.1% WR
BUY with regressor EV > 0:  63.6% WR  (+1.5pp lift)
BUY with regressor EV < 0:  59.8% WR

The filter was clearly helping SELL, marginally helping BUY. But looking at absolute EV values for BUY, we saw the BUY regressor had a mean EV of -1.27% — it systematically underestimates BUY outcomes. Filtering BUYs by EV > 0 would have cut our BUY trade volume dramatically with only marginal lift.

Decision: apply EV > 0 gate only to SELL. Leave BUY alone. This went live February 27, 2026.

Phase 4: Two Months of Live Production

With both gates in place:

BUY: raw sigmoid >= 0.80
SELL: raw sigmoid >= 0.80 AND regressor EV > 0

Post-filter performance (Feb 27 → April 4, ~36 days, ~700 trades):

BUY: 58.3% WR on 378 trades
SELL: 50.9% WR on 320 trades

The BUY side stayed strong. The SELL side flattened out to breakeven, which is an improvement over the pre-filter bleed but not a profit center.

Why SELL Stays Flat

We have a working hypothesis: crypto has an asymmetric volatility structure. Uptrends trend calmly. Downtrends come in sharp, fast impulses that are hard to predict in advance. By the time the model sees a strong SELL signal, the move has often already happened.

This is consistent with:

Our walk-forward SELL AUCs averaging 0.53-0.55 vs BUY averaging 0.57-0.59
Paper trading SELL WR consistently 5-8 percentage points below BUY across multiple retrains
Raw SELL sigmoid occasionally inverting at high thresholds (the model anti-predicts in some regimes)

We have tried feature engineering targeting SELL specifically (order book imbalance, trade flow skew, volatility skew). V3, currently in paper trading, adds 8 such features and does achieve better BUY OOS AUC (0.533 vs V2's 0.497 on a recent retrain) but shows zero SELL signal — worse than V2. The features help BUY, not SELL.

Our conclusion: SELL is structurally harder, and we accept it. A flat SELL side that provides position diversity and keeps the ensemble engaged during drawdowns is worth keeping, even if its direct contribution is zero.

Retrain Cadence and Decisions

We retrain roughly every 5-7 days. A retrain that does not clear our signal-check gate gets held back. Recent retrain decisions:

Mar 23: Deployed. Fold 4 AUCs strong. 1/2 GO on signal check.
Mar 24: Not deployed. Marginal AUC drop, no urgency.
Mar 25: Deployed. Release regressor Spearman jumped (eval 0.221 → release 0.381).
Mar 30, 31: Deployed. BUY 3/3 GO at >=0.70-0.75 thresholds.
Apr 2: Deployed. Best BUY OOS we have seen (0.591 catboost, 0.586 xgb).

Each of these gets a full signal-check report inspected by a human before release. No automated deploys to real orders.

What This Case Study Says

Walk-forward training is honest. It shows you weak regimes instead of hiding them under one big split.
Live metrics diverge from paper. Track them separately, always.
Do not deploy filters until data justifies them. We did not add the EV filter on day one. We waited 9 days of live data.
Asymmetric filters are fine. Our EV gate applies to SELL only because BUY regressor is systematically biased.
A flat side is not a dead side. SELL at 50% WR contributes diversity even with no direct edge.

Every prediction in APIndicators comes from this pipeline. You can hit the V2 ensemble live via the API, or subscribe to webhook delivery of signals as they fire. See /pricing for plans — or check /docs for the full prediction endpoint schema.