The Vector Ridge Backtesting Simulator tests trading strategies against 20+ years of historical market data across all 6 asset classes — producing key performance metrics including Sharpe ratio, maximum drawdown, win rate, profit factor, and equity curve visualisation. Backtesting is the bridge between theoretical strategy and real-capital deployment: it reveals whether a strategy has a genuine statistical edge or whether recent success was luck that will not persist.
The most important output is not total return but the Sharpe ratio — which measures return per unit of risk. A strategy that returned 50% with a Sharpe of 0.8 is less robust than one returning 30% with a Sharpe of 1.5, because the first required more risk to achieve its returns. The simulator calculates this automatically for any strategy configuration you input, allowing direct comparison between the Grade A-only approach, mixed-grade approaches, and traditional technical strategies.
Why Backtesting Is Non-Negotiable
Every trader believes their strategy works. Most are wrong — and the ones who are wrong without backtesting discover it with their capital rather than with historical data. Backtesting separates beliefs from evidence.
The value of backtesting is not prediction — no historical test guarantees future results. The value is falsification: backtesting can definitively prove that a strategy does NOT work, saving you months of losses and the psychological damage that comes with them. A strategy that fails to produce positive risk-adjusted returns across 20 years of data — which includes bull markets, bear markets, financial crises, pandemics, and regime transitions — has no edge. Period.
Conversely, a strategy that produces a Sharpe ratio above 1.0 across two decades of diverse market conditions, with consistent performance across different regimes and no single year accounting for the majority of returns, has strong evidence of a genuine statistical edge. This does not guarantee future performance, but it shifts the probability meaningfully in your favour.
The Backtesting Simulator is designed specifically for this purpose. It tests strategies across the same six markets covered by Vector Ridge signals (Forex, Futures, Indices, Equities, Crypto, Polymarket) using the Grade A-E conviction framework. You can test Grade A-only strategies against mixed-grade approaches, compare different position sizing methods, and evaluate the impact of macro regime filtering on risk-adjusted returns.
Chapter 17 of the free trading book covers backtesting methodology including the critical distinction between in-sample and out-of-sample testing.
How to Set Up a Backtest
A proper backtest requires five configuration inputs. Each input significantly affects the result, so understanding what each does is essential for interpreting the output.
1. Strategy rules (entry and exit). Define exactly when you enter and exit trades. For a Grade A moving average crossover strategy: enter long when the 50-day MA crosses above the 200-day MA and the asset is in a favourable macro regime. Exit when the 50-day crosses below the 200-day or the regime shifts. The rules must be specific enough that two different people would make the same decisions given the same data.
2. Asset class and instruments. Select which markets to test. Testing across multiple asset classes is critical — a strategy that works on equities but fails on forex may be capturing equity-specific patterns rather than a universal edge. The simulator supports all six Vector Ridge markets.
3. Position sizing method. Fixed percentage (e.g., 15% per trade), volatility-normalised (1 ATR = 1% of account), or Grade-based (A=20%, B=12%, C=6%). The sizing method often matters more than the entry/exit rules — a mediocre entry with excellent sizing outperforms a brilliant entry with reckless sizing.
4. Date range. Minimum 10 years, ideally 20+. The date range must include at least one full market cycle (bull market, correction, bear market, recovery). A backtest over 2020-2024 only includes a pandemic crash and recovery — it does not test the strategy during a prolonged bear market like 2000-2002 or a financial crisis like 2008.
5. Transaction costs and slippage. Include realistic commission costs ($1-5 per trade for stocks/ETFs, 1-2 pips for forex) and slippage assumptions (1-2% of the bid-ask spread for liquid instruments, more for illiquid ones). A strategy that is profitable before costs may be breakeven or negative after costs — especially if it trades frequently.
| Configuration | Example Setting | Impact on Results | Common Mistake |
|---|---|---|---|
| Entry Rules | 50/200 MA cross + regime filter | Determines trade frequency and timing | Rules too vague to reproduce |
| Asset Class | S&P 500 (20yr data) | Determines which patterns are tested | Testing on one market only |
| Position Sizing | Grade A: 20%, B: 12%, C: 6% | Often matters MORE than entry rules | Fixed 100% sizing (unrealistic) |
| Date Range | 2004-2026 (22 years) | Must include full market cycle | Testing only recent bull market |
| Costs/Slippage | $3/trade + 0.05% slippage | Reduces returns by 2-5% annually | Ignoring costs entirely |
Interpreting Backtest Results: The Five Key Metrics
The simulator produces multiple performance metrics. Five are essential for evaluating whether a strategy has a genuine edge.
1. Sharpe Ratio. The single most important metric. It measures return per unit of risk (volatility). Above 1.0 is good, above 1.5 is excellent, above 2.0 is elite. If a strategy has a Sharpe below 0.5, it has no meaningful edge regardless of the total return. The Sharpe ratio guide covers benchmarks and calculation methodology in detail.
2. Maximum Drawdown. The largest peak-to-trough decline during the backtest period. This tells you the worst-case scenario you should expect — and you should expect it because if it happened once in 20 years, it will happen again. A strategy with 100% total return but 50% max drawdown has a 2:1 return-to-drawdown ratio — marginal. A strategy with 80% return and 15% max drawdown has a 5.3:1 ratio — excellent.
3. Win Rate. The percentage of trades that were profitable. For trend following strategies, 35-45% is normal and expected. For mean-reversion strategies, 55-65% is typical. The win rate alone tells you nothing — a 90% win rate with a 10:1 loss-to-win ratio is a losing strategy. Always evaluate win rate alongside average win size and average loss size.
4. Profit Factor. Gross profits divided by gross losses. A profit factor above 1.5 indicates a robust edge. Above 2.0 is excellent. Below 1.2 suggests the edge is fragile and may disappear with slightly different market conditions.
5. Equity Curve Shape. Visual inspection of the equity curve matters. A smooth, steadily rising curve indicates consistent performance across market conditions. A jagged curve with one massive spike (e.g., one trade accounting for 40% of total profits) indicates fragile performance dependent on a single outlier. The ideal curve rises in all macro regimes, accelerating during Goldilocks and decelerating (but not declining) during Stagflation.
Use the Drawdown Recovery Calculator alongside your backtest results to understand how long the maximum drawdown would take to recover — a critical input for deciding whether you can psychologically and financially tolerate the strategy's risk profile.
Avoiding Overfitting: The Biggest Backtesting Trap
Overfitting is the process of optimising a strategy so precisely to historical data that it captures noise rather than signal — producing beautiful backtest results that completely fail in live trading. It is the single biggest reason that profitable backtests do not translate to profitable live trading.
Overfitting occurs when a strategy has too many parameters relative to the number of trades in the backtest. A strategy with 8 adjustable parameters (specific MA periods, RSI thresholds, volatility filters, time-of-day filters, etc.) tested over 100 trades can be 'optimised' to fit almost any historical data — but the optimised parameters are fitting randomness, not genuine market patterns.
Five rules prevent overfitting.
Rule 1: Fewer parameters is better. A strategy with 2-3 parameters (e.g., MA period, position size, regime filter) is far more likely to generalise to live trading than one with 8-10 parameters. Every additional parameter increases the risk of fitting noise.
Rule 2: Out-of-sample testing. Split your data into two periods: in-sample (used for development) and out-of-sample (held back for validation). Develop the strategy on the in-sample data. Then test it on the out-of-sample data WITHOUT making any further adjustments. If it performs similarly in both periods, the strategy likely has a genuine edge. If it performs well in-sample but poorly out-of-sample, it is overfit.
Rule 3: Cross-market validation. A strategy that works on S&P 500 data should produce similar (not identical) results on European or Asian equity indices. If it only works on one specific market, it is likely fitting market-specific noise rather than a universal pattern.
Rule 4: Economic rationale. Every strategy must have a plausible reason for why it works — not just 'the numbers look good.' The Grade A-E system works because macro regimes drive asset class returns (documented in academic research across 100+ years). A strategy that buys on the third Tuesday of months with a full moon has no economic rationale and is almost certainly overfit.
Rule 5: Minimum trade count. A backtest with fewer than 100 trades is statistically unreliable. Ideally, you want 200+ trades to establish statistical significance. If your strategy only produces 30 trades over 20 years, the results are not distinguishable from luck.
Overfitting test: after completing your backtest, deliberately change one parameter by 10-20% (e.g., use a 45-day MA instead of 50-day). If the results collapse, the strategy is overfit to the specific parameter value. A robust strategy produces similar results across a range of reasonable parameter values.
Backtesting the Grade A-E System
The Backtesting Simulator is pre-configured to test the Vector Ridge Grade A-E conviction system — the most common use case. Here is how to structure the test.
Test 1: Grade A-only vs all-grades. Configure two backtests. The first takes only Grade A setups (both macro regime and technical signal aligned) with 15-20% position sizing and no stops. The second takes all Grades A-E with standard sizing (A=20%, B=12%, C=6%, D=2%, E=0%). Compare Sharpe ratios, maximum drawdowns, and equity curves. In virtually every asset class tested, Grade A-only produces a higher Sharpe ratio despite lower total trade count — confirming that selectivity improves risk-adjusted returns.
Test 2: With vs without macro regime filter. Test a pure technical strategy (50/200 MA crossover, breakout entries) with and without the macro regime filter. The version with the regime filter eliminates trades during hostile regimes (e.g., long equity signals during Stagflation), which significantly reduces losing trades even though it also misses some winning trades. The Sharpe improvement from adding the regime filter is typically 0.3-0.5 — a meaningful edge.
Test 3: Position sizing comparison. Test fixed sizing (15% per trade regardless of Grade) vs Grade-based sizing (A=20%, B=12%, C=6%) vs volatility-normalised sizing (1 ATR = 1%). Grade-based sizing typically produces the best risk-adjusted returns because it concentrates capital in the highest-conviction trades.
Test 4: Multi-asset diversification. Test the strategy on individual markets (equities only, forex only) and then on a combined multi-asset portfolio. The multi-asset version consistently shows lower maximum drawdown (40-60% reduction) with comparable or better total returns — validating the multi-asset portfolio approach.
These four tests together take approximately 30 minutes in the simulator and provide comprehensive validation of the Grade A-E methodology. Chapter 5 of the free trading book discusses the position sizing principles that underpin these tests.
From Backtest to Live Trading: The Transition Protocol
A successful backtest is necessary but not sufficient for live trading. The transition from simulated to real capital follows a specific protocol that manages the psychological and practical differences between backtesting and live execution.
Step 1: Paper trade for 30-60 days. Execute the strategy in real time with simulated capital. This tests two things backtesting cannot: your ability to follow the rules in real time (when emotions are active) and whether the strategy's signals can be executed at the prices assumed in the backtest (slippage, fills, timing).
Step 2: Small-capital live trading (50% of normal sizing). Trade the strategy with real money but at half the intended position size. This introduces real emotional stakes — the fear and greed that do not exist in paper trading — while limiting financial impact. Run this for 20-30 trades.
Step 3: Full-size live trading. After 20-30 small-size trades confirm that: (a) the strategy's real-time performance is within 20% of the backtest metrics, (b) your execution is consistent with the rules, and (c) you can manage the psychological pressure — scale to full position sizing.
Step 4: Continuous monitoring. Compare live performance to backtest expectations monthly. If the Sharpe ratio, win rate, or drawdown deviate by more than 30% from the backtest for three consecutive months, pause and investigate. Possible causes: market regime has changed in ways not captured by the backtest, execution is deviating from rules, or the strategy's edge has degraded.
The Trade Journal tracks your live performance metrics alongside the backtest benchmarks — making it immediately visible when live results are deviating from expectations.
- 1.Backtesting is a falsification tool — it cannot prove a strategy will work, but it can definitively prove one does not work. A strategy that fails to produce a Sharpe ratio above 1.0 across 20+ years of data including multiple market cycles has no edge. Test before risking real capital.
- 2.The five essential backtest metrics are: Sharpe ratio (above 1.0 = good, 1.5+ = excellent), maximum drawdown (check your return-to-drawdown ratio), win rate (context-dependent), profit factor (above 1.5 = robust), and equity curve shape (smooth and consistent across regimes, not dependent on outlier trades).
- 3.Overfitting is the biggest backtesting trap — strategies with too many parameters produce beautiful historical results that fail live. Prevent it with: fewer parameters (2-3), out-of-sample testing, cross-market validation, economic rationale for every rule, and a minimum of 100+ trades.
Backtesting is the process of testing a trading strategy against historical market data to evaluate whether it would have been profitable. You define the strategy rules (when to enter, exit, and how to size positions), feed in historical prices, and the simulator calculates performance metrics including total return, Sharpe ratio, maximum drawdown, win rate, and profit factor. It is the primary method for validating whether a strategy has a genuine statistical edge before risking real capital.
A minimum of 10 years is required, and 20+ years is ideal. The data must include at least one full market cycle — a bull market, bear market or correction, and recovery. Testing only over 2020-2024 captures a pandemic crash and recovery but misses prolonged bear markets. The strategy should also produce at least 100 trades (ideally 200+) for statistical significance. Fewer than 100 trades means the results are not reliably distinguishable from random chance.
A Sharpe ratio above 1.0 indicates a strategy with a meaningful risk-adjusted edge. Above 1.5 is excellent — placing you in the top 10% of strategies. Above 2.0 is elite — the top 1%. Below 0.5 means the strategy has no meaningful edge regardless of total return. When comparing strategies, always use Sharpe ratio rather than total return — a 30% return with Sharpe 1.5 is more robust than 50% return with Sharpe 0.8.
Five rules prevent overfitting: (1) Keep parameters minimal — 2-3 inputs, not 8-10. (2) Split data into in-sample (development) and out-of-sample (validation) periods — the strategy must perform well on data it was NOT optimised against. (3) Validate across multiple markets. (4) Ensure every rule has economic rationale (not just 'the numbers look good'). (5) Require 100+ trades minimum. Additionally, test parameter sensitivity: if changing a parameter by 10-20% collapses results, the strategy is overfit.
Yes. Paper trade for 30-60 days to test execution quality and real-time decision-making. Then trade at 50% size for 20-30 trades to introduce real emotional stakes while limiting financial risk. Only scale to full size after confirming that live metrics are within 20% of backtest expectations. This three-step transition (paper → small size → full size) is how professional traders move from backtest validation to live deployment.
