Backtesting Basics: What the Numbers Actually Mean

The Number That Doesn't Mean What You Think

You've seen the claims. A backtested strategy with 40% annualized returns, a Sharpe ratio of 2.3, and a maximum drawdown of just 8%. Sounds perfect. Then someone tries to trade it live and barely breaks even.

This happens constantly in the systematic trading world. Backtesting is essential — you can't develop a strategy without it — but most people misread what backtests actually tell you. Understanding the gap between backtest and live performance is one of the most important skills in systematic trading.

This is backtesting basics: what the numbers mean, what they don't, and how to use them correctly.

What a Backtest Is (and Isn't)

A backtest simulates how a trading strategy would have performed on historical data. You feed the algorithm historical price data, it generates buy and sell signals, and you calculate what the returns would have been.

The fundamental problem: historical data is fixed. The strategy is optimized after seeing that data. You're fitting a model to a dataset and then measuring performance on that same dataset. This is like studying the answers to a test, taking the test, and then concluding you're smart.

The technical term for this is in-sample overfitting. A strategy can be tuned to perform extremely well on historical data while having zero predictive power going forward. The more parameters you optimize, the more likely you are to be fitting to noise rather than signal.

This doesn't mean backtesting is useless. It means you need to understand what it can and can't tell you.

The Key Metrics and What They Actually Measure

Let's go through the main backtesting metrics and what they're really capturing.

Annual Return

The headline number. Looks great in marketing, often misleading in practice.

The issues: It's usually calculated before real-world friction (slippage, commissions, market impact). It ignores the path — two strategies can have the same annual return with very different risk profiles. And it's highly sensitive to the specific time period chosen.

A strategy that returned 40% annually during the 2023-2025 bull run might have very different characteristics in a different market regime.

Sharpe Ratio

Return divided by volatility, annualized. This is a much better metric than raw return because it normalizes for risk.

Sharpe above 1.0 is decent. Above 1.5 is good. Above 2.0 is excellent and should be treated skeptically — either the strategy is genuinely exceptional, or it's overfitted.

The Sharpe ratio treats upside and downside volatility equally, which is actually a problem. If your strategy has high upside volatility (big wins) and low downside volatility (small losses), the Sharpe ratio will look worse than the actual risk profile warrants. This is why the Sortino ratio — which only penalizes downside volatility — can be more informative.

Maximum Drawdown

The deepest peak-to-trough decline during the backtest period. This is probably the most practically important number because it's what you'll actually have to live through.

A 30% maximum drawdown in a backtest means you need to be psychologically prepared to watch your portfolio fall 30% before the strategy recovers. Most traders are not prepared for this, regardless of what they tell themselves.

When you see a backtest with a suspiciously low maximum drawdown — especially paired with high returns — be skeptical. Either the strategy was carefully fitted to avoid historical drawdown periods, or the backtest has errors.

Win Rate vs. Profit Factor

Win rate (percentage of trades that are profitable) is surprisingly uninformative on its own. A strategy with a 30% win rate can be highly profitable if the wins are much larger than the losses. A strategy with a 70% win rate can lose money if small wins and large losses.

Profit factor — total gross profit divided by total gross loss — is more useful. A profit factor above 1.5 is generally respectable. Above 2.0 suggests either genuine edge or overfitting.

The Backtest Traps That Catch Everyone

Lookahead Bias

This is using information in the simulation that wouldn't have been available at the time the trade was made. A simple example: using end-of-day closing prices to generate signals that are executed at that same day's close.

In reality, you don't know the closing price until the market closes. If your signal requires closing price data, you're executing the next day at the open, not at the close — a meaningful difference.

Lookahead bias can produce dramatically inflated backtest results that are impossible to replicate live. Every serious backtesting framework has to be designed explicitly to prevent it.

Survivorship Bias

If you're backtesting on a universe of stocks, you need to include stocks that were delisted, went bankrupt, or were otherwise removed from the index during your test period.

Most publicly available stock data only includes currently surviving companies. Backtesting on this data systematically overstates returns because you're only testing on the winners.

For SPY and major ETF strategies, this is less of an issue since you're trading a single instrument rather than selecting from a universe. But for any strategy involving stock selection, survivorship bias is a serious concern.

Transaction Cost Underestimation

Every backtest makes assumptions about transaction costs — commissions, bid-ask spreads, slippage. Most backtests use optimistic assumptions: zero commissions, no slippage, fills at the midpoint.

In reality, liquid instruments like SPY have very small spreads, so this matters less than for illiquid stocks. But high-frequency strategies that trade dozens of times per day can see real-world friction costs erode backtest performance significantly.

Overfitting Through Parameter Selection

If you test 1,000 parameter combinations and report the best-performing one, you haven't discovered an edge — you've discovered random luck. With enough combinations, some will outperform by chance.

Proper out-of-sample testing is the solution: test your strategy on a period of data that was never used during development. If performance holds up out-of-sample, you have more reason to believe it reflects genuine edge rather than fitting.

What Walk-Forward Analysis Tells You

Walk-forward analysis is the gold standard for stress-testing a systematic strategy. The idea: train on a rolling window of historical data, test on the next period, advance the window, repeat.

This simulates how the strategy would have performed if you'd been trading it in real time, recalibrating parameters as new data became available. It's much more realistic than a single in-sample backtest.

If a strategy degrades significantly in walk-forward testing compared to in-sample performance, it's a red flag. If it holds up reasonably well, you have more confidence in its robustness.

The Right Way to Use Backtest Results

Backtests are necessary but not sufficient. Use them to:

Eliminate strategies that clearly don't work
Understand the risk characteristics of a strategy under historical conditions
Generate hypotheses about where edge might exist
Estimate reasonable expected ranges for live performance

Do not use backtests to:

Set precise return expectations for live trading
Convince yourself a strategy is "proven"
Skip live testing with real money (even small positions)

The gap between backtest and live performance is real. The right question isn't "why does live performance differ from backtest?" — it always will. The right question is "is the live performance within a reasonable range of what the backtest suggested?"

How Lukra thinks about the myth of beating the market →

What This Means for Evaluating Any System

When you see a trading system advertised with impressive backtest numbers, now you know the right questions to ask:

Was it tested in-sample or out-of-sample?
How many parameters were optimized?
Does the backtest account for real transaction costs?
Was it walk-forward tested?
What does live performance look like?

The systems that hold up under these questions are rare. They're also the ones worth your attention.

Past performance is not indicative of future results. All trading involves risk of loss. This content is for educational purposes only.