Self-Training Backtest β Per-Sport Model Performance
Loading walk-forward backtest results from last 14 days of completed gamesβ¦
π What this shows
For every historical game, we back-compute what the ELO model would have predicted using ONLY information available pre-game. The result is a per-sport calibration audit:
- Hit rate: % of times rounded P(home win) matched actual outcome
- Brier score: avg squared error between probability and outcome (lower = sharper)
- Bias: avg(predicted) β avg(actual). Positive = model favors home too much
- Reliability bins: for each 10% prob bucket, predicted hit rate vs actual
When sample β₯ 30 and |bias| β₯ 2pp, the system auto-applies a calibration shift to future predictions.