Backtesting Trading Strategies with Python: A Complete Guide

Every profitable trading strategy started as a hypothesis that was tested against historical data. Backtesting is how you separate ideas that sound good from ideas that actually make money. But backtesting is also where most traders deceive themselves. A poorly constructed backtest will tell you exactly what you want to hear, then lose money the moment you go live.

The difference between a useful backtest and a misleading one comes down to methodology. This guide shows you how to build a backtesting framework in Python that produces trustworthy results, handles the specific challenges of crypto markets, and helps you make informed decisions about which strategies to deploy with real capital.

We will build a complete backtesting system from scratch, avoiding the black-box approach of most backtesting libraries. When you understand every line of your backtester, you can trust its output.

Why Most Backtests Lie

Before writing any code, you need to understand the three ways backtests deceive traders:

Look-ahead bias: Using information that would not have been available at the time of the trade. This includes using future data in indicator calculations, using close prices for signals when you would have needed to wait for the next candle, or training an ML model on data that overlaps with the test period.

Survivorship bias: Only testing on assets that still exist today. The altcoins that went to zero are not in your dataset, which skews results upward.

Overfitting: Tuning your strategy parameters to maximize backtest performance on a specific dataset. The strategy performs brilliantly on historical data and poorly on new data because it has memorized past patterns rather than learned generalizable rules.

A good backtesting framework addresses all three of these systematically.

The Backtesting Framework

Our framework processes one candle at a time in chronological order, maintaining a portfolio state that tracks positions, cash, and trade history:

from dataclasses import dataclass, field
from enum import Enum
import pandas as pd
import numpy as np

class Side(Enum):
    BUY = "buy"
    SELL = "sell"

@dataclass
class Trade:
    entry_time: pd.Timestamp
    exit_time: pd.Timestamp | None
    side: Side
    entry_price: float
    exit_price: float | None
    size: float
    pnl: float | None = None

@dataclass
class Portfolio:
    initial_capital: float
    cash: float = 0.0
    position_size: float = 0.0
    position_side: Side | None = None
    position_entry_price: float = 0.0
    trades: list = field(default_factory=list)
    equity_curve: list = field(default_factory=list)

    def __post_init__(self):
        self.cash = self.initial_capital

The core backtest engine iterates through each candle, calls the strategy for a signal, and executes trades with realistic assumptions:

class BacktestEngine:
    def __init__(self, initial_capital: float = 10000.0, commission: float = 0.0006, slippage: float = 0.0002):
        self.initial_capital = initial_capital
        self.commission = commission
        self.slippage = slippage

    def run(self, df: pd.DataFrame, strategy) -> Portfolio:
        portfolio = Portfolio(initial_capital=self.initial_capital)

        indicators_df = strategy.compute_indicators(df)

        for i in range(strategy.warmup_period, len(indicators_df)):
            current_bar = indicators_df.iloc[i]
            history = indicators_df.iloc[:i+1]

            self._check_exits(portfolio, current_bar)

            signal = strategy.generate_signal(history)

            if signal and portfolio.position_size == 0:
                self._open_position(portfolio, current_bar, signal, strategy)

            equity = self._calculate_equity(portfolio, current_bar["close"])
            portfolio.equity_curve.append({
                "timestamp": current_bar.name,
                "equity": equity,
            })

        return portfolio

    def _open_position(self, portfolio: Portfolio, bar, signal: Side, strategy):
        price = bar["close"]
        slipped_price = price * (1 + self.slippage) if signal == Side.BUY else price * (1 - self.slippage)

        risk_amount = portfolio.cash * strategy.risk_per_trade
        stop_distance = abs(price - strategy.calculate_stop_loss(bar, signal))
        size = risk_amount / stop_distance if stop_distance > 0 else 0

        cost = size * slipped_price * self.commission
        portfolio.cash -= cost

        portfolio.position_size = size
        portfolio.position_side = signal
        portfolio.position_entry_price = slipped_price

        portfolio.trades.append(Trade(
            entry_time=bar.name,
            exit_time=None,
            side=signal,
            entry_price=slipped_price,
            exit_price=None,
            size=size,
        ))

    def _check_exits(self, portfolio: Portfolio, bar):
        if portfolio.position_size == 0:
            return

        exit_price = None

        if portfolio.position_side == Side.BUY:
            if bar["low"] <= portfolio.trades[-1].entry_price * 0.98:
                exit_price = portfolio.trades[-1].entry_price * 0.98
            elif bar["high"] >= portfolio.trades[-1].entry_price * 1.04:
                exit_price = portfolio.trades[-1].entry_price * 1.04
        else:
            if bar["high"] >= portfolio.trades[-1].entry_price * 1.02:
                exit_price = portfolio.trades[-1].entry_price * 1.02
            elif bar["low"] <= portfolio.trades[-1].entry_price * 0.96:
                exit_price = portfolio.trades[-1].entry_price * 0.96

        if exit_price:
            self._close_position(portfolio, bar, exit_price)

    def _close_position(self, portfolio: Portfolio, bar, exit_price: float):
        trade = portfolio.trades[-1]
        slipped_exit = exit_price * (1 - self.slippage) if trade.side == Side.BUY else exit_price * (1 + self.slippage)

        if trade.side == Side.BUY:
            pnl = (slipped_exit - trade.entry_price) * portfolio.position_size
        else:
            pnl = (trade.entry_price - slipped_exit) * portfolio.position_size

        cost = portfolio.position_size * slipped_exit * self.commission
        portfolio.cash += pnl - cost

        trade.exit_time = bar.name
        trade.exit_price = slipped_exit
        trade.pnl = pnl - cost

        portfolio.position_size = 0
        portfolio.position_side = None

    def _calculate_equity(self, portfolio: Portfolio, current_price: float) -> float:
        unrealized = 0.0
        if portfolio.position_size > 0:
            if portfolio.position_side == Side.BUY:
                unrealized = (current_price - portfolio.position_entry_price) * portfolio.position_size
            else:
                unrealized = (portfolio.position_entry_price - current_price) * portfolio.position_size
        return portfolio.cash + unrealized

Notice several important details: we apply slippage to both entries and exits, we charge commission on both sides of the trade, and we use the next bar for execution (the signal is generated from data up to bar i, and execution happens at bar i's close, which simulates placing an order that fills on the next available price).

Implementing a Strategy

Strategies implement a simple interface: compute indicators, generate signals, and define risk parameters:

class SMACrossoverStrategy:
    def __init__(self, fast_period: int = 20, slow_period: int = 50):
        self.fast_period = fast_period
        self.slow_period = slow_period
        self.warmup_period = slow_period + 10
        self.risk_per_trade = 0.02

    def compute_indicators(self, df: pd.DataFrame) -> pd.DataFrame:
        result = df.copy()
        result["sma_fast"] = df["close"].rolling(self.fast_period).mean()
        result["sma_slow"] = df["close"].rolling(self.slow_period).mean()
        result["atr"] = self._calculate_atr(df, 14)
        result.dropna(inplace=True)
        return result

    def generate_signal(self, history: pd.DataFrame) -> Side | None:
        if len(history) < 2:
            return None

        current = history.iloc[-1]
        previous = history.iloc[-2]

        if previous["sma_fast"] <= previous["sma_slow"] and current["sma_fast"] > current["sma_slow"]:
            return Side.BUY
        elif previous["sma_fast"] >= previous["sma_slow"] and current["sma_fast"] < current["sma_slow"]:
            return Side.SELL

        return None

    def calculate_stop_loss(self, bar, side: Side) -> float:
        atr = bar["atr"]
        if side == Side.BUY:
            return bar["close"] - (1.5 * atr)
        return bar["close"] + (1.5 * atr)

    def _calculate_atr(self, df: pd.DataFrame, period: int) -> pd.Series:
        high_low = df["high"] - df["low"]
        high_close = (df["high"] - df["close"].shift()).abs()
        low_close = (df["low"] - df["close"].shift()).abs()
        tr = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
        return tr.rolling(window=period).mean()

Analyzing Results

Raw trade data is useless without proper performance metrics. These are the metrics that matter for evaluating a crypto trading strategy:

def calculate_metrics(portfolio: Portfolio) -> dict:
    closed_trades = [t for t in portfolio.trades if t.pnl is not None]
    if not closed_trades:
        return {"error": "No closed trades"}

    pnls = [t.pnl for t in closed_trades]
    wins = [p for p in pnls if p > 0]
    losses = [p for p in pnls if p <= 0]

    equity = pd.DataFrame(portfolio.equity_curve)
    equity.set_index("timestamp", inplace=True)

    peak = equity["equity"].expanding().max()
    drawdown = (equity["equity"] - peak) / peak

    total_return = (equity["equity"].iloc[-1] / portfolio.initial_capital - 1) * 100

    return {
        "total_trades": len(closed_trades),
        "win_rate": len(wins) / len(closed_trades) * 100,
        "total_return_pct": round(total_return, 2),
        "max_drawdown_pct": round(drawdown.min() * 100, 2),
        "avg_win": round(np.mean(wins), 2) if wins else 0,
        "avg_loss": round(np.mean(losses), 2) if losses else 0,
        "profit_factor": round(sum(wins) / abs(sum(losses)), 2) if losses else float("inf"),
        "sharpe_ratio": round(calculate_sharpe(equity["equity"]), 2),
        "max_consecutive_losses": max_consecutive(pnls, lambda x: x <= 0),
    }

def calculate_sharpe(equity_series: pd.Series, risk_free_rate: float = 0.0) -> float:
    returns = equity_series.pct_change().dropna()
    if returns.std() == 0:
        return 0.0
    excess_returns = returns.mean() - risk_free_rate / 365
    return excess_returns / returns.std() * np.sqrt(365)

def max_consecutive(values: list, condition) -> int:
    max_count = 0
    current = 0
    for v in values:
        if condition(v):
            current += 1
            max_count = max(max_count, current)
        else:
            current = 0
    return max_count

Key metrics to evaluate:

| Metric | Good Result | Concerning Result | |--------|------------|-------------------| | Win Rate | 55%+ (with 1:1 R:R) | Below 50% (unless R:R compensates) | | Profit Factor | Above 1.5 | Below 1.2 | | Max Drawdown | Below 15% | Above 25% | | Sharpe Ratio | Above 1.5 | Below 0.5 | | Max Consecutive Losses | Below 8 | Above 12 |

Walk-Forward Validation

The gold standard for backtest validation is walk-forward analysis. Instead of training on one period and testing on another, you repeatedly train on a rolling window and test on the next out-of-sample period:

def walk_forward_backtest(df: pd.DataFrame, strategy_class, train_months: int = 6, test_months: int = 1) -> list:
    results = []
    total_months = len(df.resample("ME").last())

    for start in range(0, total_months - train_months - test_months + 1, test_months):
        train_start = df.index[0] + pd.DateOffset(months=start)
        train_end = train_start + pd.DateOffset(months=train_months)
        test_end = train_end + pd.DateOffset(months=test_months)

        train_data = df[train_start:train_end]
        test_data = df[train_end:test_end]

        if len(test_data) == 0:
            continue

        strategy = strategy_class()
        strategy.optimize(train_data)

        engine = BacktestEngine()
        portfolio = engine.run(test_data, strategy)
        metrics = calculate_metrics(portfolio)
        metrics["period"] = f"{train_end.strftime('%Y-%m')} to {test_end.strftime('%Y-%m')}"
        results.append(metrics)

    return results

Walk-forward validation answers the question that simple backtesting cannot: does this strategy work on data it has never seen? If a strategy performs well across multiple walk-forward windows, it has a much higher probability of working in live trading than one that was optimized on a single historical period.

Handling Crypto-Specific Challenges

Crypto backtesting has unique challenges that stock market backtesters do not face:

24/7 markets: There are no market closes, so "daily" candles are arbitrary (UTC midnight convention). Make sure your data source and strategy use the same timezone convention.

Exchange-specific data: Different exchanges have different prices, different volumes, and different liquidity. A strategy that works on Binance data may not work on Bybit data. Always backtest on data from the exchange you plan to trade on.

Funding rates: In perpetual futures, you pay or receive funding every 8 hours. A strategy that holds positions for days can lose a significant amount to funding fees. Include funding rate costs in your backtest:

def apply_funding_costs(portfolio: Portfolio, avg_funding_rate: float = 0.0001, funding_interval_hours: int = 8):
    for trade in portfolio.trades:
        if trade.exit_time and trade.entry_time:
            holding_hours = (trade.exit_time - trade.entry_time).total_seconds() / 3600
            funding_periods = holding_hours / funding_interval_hours
            funding_cost = trade.size * trade.entry_price * avg_funding_rate * funding_periods
            trade.pnl -= funding_cost

Liquidation risk: Leveraged positions can be liquidated if price moves against you beyond your margin. Your backtester should simulate liquidations to produce realistic results for leveraged strategies.

Running Your First Backtest

Putting it all together:

import ccxt

exchange = ccxt.binance()
ohlcv = exchange.fetch_ohlcv("BTC/USDT", timeframe="4h", limit=1000)

df = pd.DataFrame(ohlcv, columns=["timestamp", "open", "high", "low", "close", "volume"])
df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms")
df.set_index("timestamp", inplace=True)

strategy = SMACrossoverStrategy(fast_period=20, slow_period=50)
engine = BacktestEngine(initial_capital=10000, commission=0.0006, slippage=0.0002)
portfolio = engine.run(df, strategy)
metrics = calculate_metrics(portfolio)

for key, value in metrics.items():
    print(f"{key}: {value}")

Conclusion

A backtest is a hypothesis test, not a crystal ball. It tells you whether a strategy had an edge in the past under specific conditions. Whether that edge persists in the future depends on the robustness of your methodology: proper out-of-sample validation, realistic cost assumptions, and resistance to overfitting.

Build your backtester to be skeptical by default. Include all costs. Use walk-forward validation. And remember the most important rule of backtesting: if the results look too good to be true, they are. A strategy that returns 200% annually in a backtest will not return 200% annually in live trading. But a strategy that returns 15-30% with low drawdowns and consistent walk-forward results is worth deploying. The goal is not to find the perfect backtest. It is to find strategies with a reliable, modest edge that compounds over time.