by bpleone

Self-Training Backtest β€” Per-Sport Model Performance

Loading walk-forward backtest results from last 14 days of completed games…

πŸ“‹ What this shows

For every historical game, we back-compute what the ELO model would have predicted using ONLY information available pre-game. The result is a per-sport calibration audit:

  • Hit rate: % of times rounded P(home win) matched actual outcome
  • Brier score: avg squared error between probability and outcome (lower = sharper)
  • Bias: avg(predicted) βˆ’ avg(actual). Positive = model favors home too much
  • Reliability bins: for each 10% prob bucket, predicted hit rate vs actual

When sample β‰₯ 30 and |bias| β‰₯ 2pp, the system auto-applies a calibration shift to future predictions.