Warehouse fresh · 2026-05-04 08:09 UTC
Model R̂ max 1.010
ESS min 644
Divergences 0
Trained 2026-05-04 08:48 UTC

Methodology

A short, honest description of the model behind these forecasts — the data it sees, the statistical assumptions it makes, and the things it deliberately ignores. Everything you see on the site is generated by the pipeline described here.

1 · Overview

The forecast is produced by a hierarchical Bayesian generalised linear model fitted with Hamiltonian Monte Carlo (PyMC's NUTS sampler). It estimates each team's latent offensive and defensive strength using every regular-season and postseason game from 2022 to date, then projects the remainder of the 2026 regular season via Monte Carlo simulation of the posterior.

We chose a Bayesian framework for three reasons: posterior credible intervals quantify what we don't know; partial pooling keeps early-season ratings honest when a team has only played a dozen games; and posterior draws plug straight into a forward simulation, so the league standings we show are calibrated, uncertainty-aware projections rather than point estimates.

2 · Data pipeline

All training data flows from a separate warehouse repository. The ingestion layer pulls every MLB game from the MLB Stats API into a bronze table, which dbt transforms into a gold feature table (mlb_gold.feat_matchup) and a game-level fact table (mlb_gold.fct_games). This model reads both.

Critically, the rolling features in the warehouse are computed using only games that occurred before the current row — no target leakage. When we use, for example, a team's last-30-game win percentage on a given date, it genuinely means "as of the morning of that day".

3 · The likelihood

Each game contributes two observations to the model: the runs the home team scored, and the runs the away team scored. Runs are modelled as Poisson-distributed with a rate that depends on the two teams involved and whether the scoring team was playing at home:

runsteam, game ~ Poisson(λ)

log(λ) = intercept + offence[team] − defence[opponent] + home_adv · is_home

The Poisson is the natural distribution for count outcomes; in practice it also fits MLB run totals well because runs are rare events across a fixed inning structure. The log link means the three ratings combine multiplicatively on the rate scale, which matches how scoring actually works: a strong offence raises expected runs by a proportion, not a constant.

4 · Partial pooling within league

Team-level ratings are drawn from league-level priors, so early in a season every team is pulled towards the American or National League average until its own results earn it some distance. Without this "shrinkage" a team that started 3–0 would briefly look world-class; with it, the model concedes uncertainty and waits for more data.

offence[team] ~ Normal(league_offence[league(team)], σoff)
defence[team] ~ Normal(league_defence[league(team)], σdef)

league_offence, league_defence ~ Normal(0, 0.2)
σoff, σdef ~ HalfNormal(0.3)

The hyper-priors (σoff, σdef) are themselves learnt from the data. If the two leagues diverge sharply from the global mean, or if one league is markedly more spread-out than the other, the posterior will reflect that.

5 · Recency weighting

Baseball teams change — rosters turn over, managers get fired, front offices tear down and rebuild. A team's 2022 performance is not particularly informative about its 2026 performance, yet an unweighted likelihood would let four years of ancient results drown out a few dozen current ones. That's how a team in the middle of a genuine rebound (hello, Athletics) ends up stuck near the bottom of the posterior long after it should have moved.

Rather than model each season as its own parameter (tried; the random walk posterior geometry was unkind to NUTS and the fit took too long on the CI runner), each game's log-likelihood contribution is scaled by its age:

wi = exp( − agei / τ ) τ = 1.5 years

weighted log-likelihood = Σi wi · log p(runsi | λi)

Concrete weights for a refresh run today:

So four full rebuild seasons exert roughly a tenth of the influence that a full current season would, without the model having to discretise time into seasons explicitly. In PyMC this is applied via pm.Potential on a vectorised log-likelihood; the posterior sees each observation as a fractional contribution proportional to its recency.

6 · Global parameters

Two parameters are shared across the whole league.

7 · Inference

Posteriors are sampled with PyMC's No-U-Turn Sampler (NUTS), the current default for Hamiltonian Monte Carlo on continuous parameter spaces. Settings for each nightly run:

chains: 4 (sampled in parallel on 4 vCPUs)
warm-up: 1,000 iterations per chain
retained draws: 500 iterations per chain
target_accept: 0.95
total samples: 2,000 posterior draws

A single fit takes roughly seven minutes on a GitHub Actions runner. The warm-up phase tunes the mass matrix and step size; only post-warm-up draws are used for inference and simulation. The slightly tighter target_accept (0.95 rather than PyMC's 0.8 default) keeps divergences at zero on this model.

8 · Convergence diagnostics

Every run writes three headline diagnostics into predictions.model_diagnostics, and the current values appear in the status strip at the top of every page.

9 · Season simulation

The posterior alone tells you how strong each team is; to translate that into standings and playoff odds we simulate the remaining regular season. For each of the 10,000 simulations, and each remaining scheduled game:

  1. Draw a single posterior sample of all parameters.
  2. Compute home-team and away-team expected run rates from the model equation.
  3. Draw a Poisson realisation for each side to get simulated run totals.
  4. Assign the win to whichever team scored more; aggregate wins onto each team's running season total.

Across 10,000 simulations this produces a full distribution of final-season win counts for every team, which is where the "80% CI" projection ranges and the division-winner probabilities come from.

10 · Playoff qualification

The playoff probability shown on the League Predictions tab is simply the fraction of simulations in which a team finished in the 2026 MLB postseason field. Under current rules, that's six teams per league: three division winners plus three wild cards taken from the remaining teams by regular-season win percentage. Ties are broken by assigning fractional wins to every tied team, so the reported odds always sum exactly to three division winners and three wild cards per league across simulations.

11 · Data sources

12 · Deliberate omissions

Several signals that might plausibly improve predictive accuracy are not in v1. Each is a conscious choice; each is a candidate for a future release.