There is no shortage of crypto ML content promising 80% win rates from elaborate backtests. The gap between backtest and live production is where most of these claims die. This is a post about what actually happened when we took our V2 ensemble from walk-forward training to real orders with real money.
V2 went live on February 18, 2026. As of today (April 4, 2026) it has executed 973 real trades on Binance Futures across 10 simultaneous open positions, refreshed every 10 minutes. Here are the numbers that matter, and the decisions we made along the way.
The Production Snapshot
| Metric | BUY | SELL | Overall | |--------|-----|------|---------| | Trades | 498 | 475 | 973 | | Win rate | 59.0% | 50.3% | 54.6% | | Avg PnL per trade | +0.319% | -0.041% | +0.145% |
No survivorship bias. No cherry-picked date ranges. Every signal that met our gate (raw sigmoid >= 0.80 + EV > 0 for SELL) executed, with 10 max simultaneous orders cap.
The story here is a two-sided model with a strong BUY side and a stuck SELL side. We have kept SELL live because cutting it would halve our signal volume, and the EV filter keeps the worst losers out. Below is how we got here.
Phase 1: Walk-Forward Training
Our V2 training runs four walk-forward folds across 60 days of data. Each fold trains on an expanding window, tests on the next 5-day block. We run three gradient boosters (LightGBM, XGBoost, CatBoost) on each fold.
A typical training output looks like this:
Fold 1 (Feb 10-15):
BUY lgbm 0.583 | xgb 0.586 | catboost 0.591
SELL lgbm 0.556 | xgb 0.557 | catboost 0.555
Fold 2 (Feb 15-20): [weak regime]
BUY lgbm 0.457 | xgb 0.476 | catboost 0.465
SELL lgbm 0.435 | xgb 0.454 | catboost 0.441
Fold 3 (Feb 20-25):
BUY lgbm 0.614 | xgb 0.622 | catboost 0.611
SELL lgbm 0.578 | xgb 0.582 | catboost 0.571
Fold 4 (Feb 25-Mar 2):
BUY lgbm 0.635 | xgb 0.641 | catboost 0.627
SELL lgbm 0.572 | xgb 0.576 | catboost 0.568
Combined OOS BUY: 0.572 | 0.581 | 0.574
Combined OOS SELL: 0.535 | 0.542 | 0.534
Fold 2 is terrible. This is a feature, not a bug. Crypto goes through regimes the model cannot predict (news shocks, Fed announcements, liquidation cascades), and a walk-forward framework shows you that explicitly. If every fold were 0.65+, we would be suspicious of data leakage.
Deploy decision rule we landed on: if fold 4 (most recent) BUY AUC is above 0.60 and combined BUY OOS is above 0.55, we deploy. Weak middle folds do not block.
Phase 2: The Feb 18 Go-Live
V2 replaced V1 on February 18. V1 was a single LightGBM classifier whose signal had completely decayed by mid-February (paper trading showed no edge at any threshold). V2 shipped with raw sigmoid >= 0.80 for both sides, no EV filter yet.
First 9 days (Feb 18-Feb 27):
- BUY: ~62% WR on 140 trades — strong start
- SELL: ~48% WR on 130 trades — bleeding
The SELL losses were concentrated in narrow ranges. A SELL signal would fire, price would chop sideways, then stop out. Meanwhile BUY signals fired mostly in established uptrends and hit their +2% targets.
Phase 3: The EV > 0 Filter (Feb 27)
We trained a companion regressor alongside each classifier. The regressor predicts expected return (as a percentage) for each candidate signal. We began storing it with every prediction from day one but did not use it for gating until Feb 27.
Analysis on ~260 trades showed:
All SELL signals: 48.4% WR
SELL with regressor EV > 0: 54.1% WR (+5.7pp lift)
SELL with regressor EV < 0: 42.9% WR
All BUY signals: 62.1% WR
BUY with regressor EV > 0: 63.6% WR (+1.5pp lift)
BUY with regressor EV < 0: 59.8% WR
The filter was clearly helping SELL, marginally helping BUY. But looking at absolute EV values for BUY, we saw the BUY regressor had a mean EV of -1.27% — it systematically underestimates BUY outcomes. Filtering BUYs by EV > 0 would have cut our BUY trade volume dramatically with only marginal lift.
Decision: apply EV > 0 gate only to SELL. Leave BUY alone. This went live February 27, 2026.
Phase 4: Two Months of Live Production
With both gates in place:
- BUY: raw sigmoid >= 0.80
- SELL: raw sigmoid >= 0.80 AND regressor EV > 0
Post-filter performance (Feb 27 → April 4, ~36 days, ~700 trades):
- BUY: 58.3% WR on 378 trades
- SELL: 50.9% WR on 320 trades
The BUY side stayed strong. The SELL side flattened out to breakeven, which is an improvement over the pre-filter bleed but not a profit center.
Why SELL Stays Flat
We have a working hypothesis: crypto has an asymmetric volatility structure. Uptrends trend calmly. Downtrends come in sharp, fast impulses that are hard to predict in advance. By the time the model sees a strong SELL signal, the move has often already happened.
This is consistent with:
- Our walk-forward SELL AUCs averaging 0.53-0.55 vs BUY averaging 0.57-0.59
- Paper trading SELL WR consistently 5-8 percentage points below BUY across multiple retrains
- Raw SELL sigmoid occasionally inverting at high thresholds (the model anti-predicts in some regimes)
We have tried feature engineering targeting SELL specifically (order book imbalance, trade flow skew, volatility skew). V3, currently in paper trading, adds 8 such features and does achieve better BUY OOS AUC (0.533 vs V2's 0.497 on a recent retrain) but shows zero SELL signal — worse than V2. The features help BUY, not SELL.
Our conclusion: SELL is structurally harder, and we accept it. A flat SELL side that provides position diversity and keeps the ensemble engaged during drawdowns is worth keeping, even if its direct contribution is zero.
Retrain Cadence and Decisions
We retrain roughly every 5-7 days. A retrain that does not clear our signal-check gate gets held back. Recent retrain decisions:
- Mar 23: Deployed. Fold 4 AUCs strong. 1/2 GO on signal check.
- Mar 24: Not deployed. Marginal AUC drop, no urgency.
- Mar 25: Deployed. Release regressor Spearman jumped (eval 0.221 → release 0.381).
- Mar 30, 31: Deployed. BUY 3/3 GO at >=0.70-0.75 thresholds.
- Apr 2: Deployed. Best BUY OOS we have seen (0.591 catboost, 0.586 xgb).
Each of these gets a full signal-check report inspected by a human before release. No automated deploys to real orders.
What This Case Study Says
- Walk-forward training is honest. It shows you weak regimes instead of hiding them under one big split.
- Live metrics diverge from paper. Track them separately, always.
- Do not deploy filters until data justifies them. We did not add the EV filter on day one. We waited 9 days of live data.
- Asymmetric filters are fine. Our EV gate applies to SELL only because BUY regressor is systematically biased.
- A flat side is not a dead side. SELL at 50% WR contributes diversity even with no direct edge.
Every prediction in APIndicators comes from this pipeline. You can hit the V2 ensemble live via the API, or subscribe to webhook delivery of signals as they fire. See /pricing for plans — or check /docs for the full prediction endpoint schema.