Self-Training Backtest — Per-Sport Model Performance

Loading walk-forward backtest results from last 14 days of completed games…

📋 What this shows

For every historical game, we back-compute what the ELO model would have predicted using ONLY information available pre-game. The result is a per-sport calibration audit:

Hit rate: % of times rounded P(home win) matched actual outcome
Brier score: avg squared error between probability and outcome (lower = sharper)
Bias: avg(predicted) − avg(actual). Positive = model favors home too much
Reliability bins: for each 10% prob bucket, predicted hit rate vs actual

When sample ≥ 30 and |bias| ≥ 2pp, the system auto-applies a calibration shift to future predictions.