Prediction Models

Walk-forward backtested on historical pro matches · honest numbers, not marketing

Evaluated 2026-06-17 on 1117 counted matches (warmup ≥ 5 games per team).

Head-to-Head Elo

elo_h2h

Classic Elo, no side or recency awareness.

Accuracy
0.513
50% = coinflip
Brier score
0.2502
0.25 = coinflip · lower = better
Log-loss
0.6936
0.693 = coinflip · lower = better
Top-quartile acc
0.602
highest-confidence 25%
Calibration · predicted vs actual win rate
50–60%
48.3% 839
60–70%
56.5% 216
70–80%
73.6% 53
80–90%
77.8% 9
predicted actual

H2H Elo + Radiant bias

elo_h2h_side

Classic Elo with a +25 Elo nudge for radiant (observed historical edge).

Accuracy
0.528
50% = coinflip
Brier score
0.2489
0.25 = coinflip · lower = better
Log-loss
0.6909
0.693 = coinflip · lower = better
Top-quartile acc
0.620
highest-confidence 25%
Calibration · predicted vs actual win rate
50–60%
49.8% 797
60–70%
58.1% 253
70–80%
67.9% 56
80–90%
72.7% 11
predicted actual

Patch-aware Elo (60d half-life)

elo_recency_60d

Stale ratings drift back to 1500 with a 60-day half-life. Penalizes teams who haven't played recently — meta shift simulator.

Accuracy
0.527
50% = coinflip
Brier score
0.2484
0.25 = coinflip · lower = better
Log-loss
0.6897
0.693 = coinflip · lower = better
Top-quartile acc
0.631
highest-confidence 25%
Calibration · predicted vs actual win rate
50–60%
49.4% 815
60–70%
60.2% 246
70–80%
68.0% 50
80–90%
66.7% 6
predicted actual

Patch-aware Elo (120d half-life)

elo_recency_120d

Slower decay — keeps ratings stickier than 60d. Useful when patch cadence is slower.

Accuracy
0.526
50% = coinflip
Brier score
0.2486
0.25 = coinflip · lower = better
Log-loss
0.6902
0.693 = coinflip · lower = better
Top-quartile acc
0.624
highest-confidence 25%
Calibration · predicted vs actual win rate
50–60%
49.4% 811
60–70%
59.3% 246
70–80%
68.0% 50
80–90%
70.0% 10
predicted actual

Roadmap what's not built yet

Patch-aware Elo planned
Current Elo treats 6-month-old matches the same as last week. Patches shift meta significantly — adding patch-decay weighting should lift accuracy ~2-3pp.
Roster-tracked Elo planned
Team rating carries through roster changes. When a star player transfers, the new team should partially inherit the rating.
Draft XGBoost (live) research
Pick/ban order has meaningful signal but drafts are only known AFTER game start. Live-only model for in-progress matches.