Methodology
The math behind the predictions.
The Core: 25-State Markov Chains
Every at-bat in baseball starts from one of 24 possible game states — a combination of 8 base-runner configurations (nobody on, runner on first, runners on first and second, etc.) and 3 out counts (0, 1, or 2 outs). Plus one absorbing state: 3 outs, inning over.
For every batter and pitcher in our database, we have a 25×25 transition probability matrix — the probability of moving from any state to any other state when that batter faces that pitcher. These matrices are built from 10 seasons of play-by-play data (2016-2025) covering over 1.7 million at-bats.
We call these matrices Markov Scoring Indices (MSIs).
Combining Batter + Pitcher
For each at-bat, we combine the batter's MSI and the pitcher's MSI into a single transition matrix. The combination weights each player's tendencies against league-average baselines, producing a matchup-specific probability distribution.
This is the heart of the simulation — every at-bat outcome is driven by the specific batter-pitcher matchup, not team-level averages.
Monte Carlo Simulation
To predict a game, we simulate it 10,000 times. Each simulation walks through the full lineup for both teams, simulating every at-bat using the combined transition matrices. Runs score naturally from state transitions (e.g., runner on third → 0 outs transitions to next state with a run scored).
From 10,000 simulations, we get full probability distributions for:
- Win probability for each team
- Expected score and score distribution
- Over/under probabilities at various lines
Elo Ratings
Pure Markov simulation captures individual matchups but misses team-level factors: momentum, bullpen depth, coaching, and intangibles. We supplement with an Elo rating system that updates after every game.
Our production model blends Markov and Elo predictions 50/50. This ensemble consistently outperforms either component alone:
3-season average (7,289 games):
Markov only: ~53.9%
Elo only: ~56.0%
Blend 50/50: ~56.1%
Data Sources
- Retrosheet — Play-by-play data for MSI construction (2016-2025). 22,764 games, 2,513 batters, 2,450 pitchers.
- MySportsFeeds — Daily schedules, lineups, and live game data during the season.
What's Next
- 🔄 Recency weighting — Exponential decay for older seasons so recent performance matters more.
- 🏟️ Park factors — Adjusting for ballpark-specific run environments (Coors Field isn't Petco Park).
- 🌦️ Weather integration — Temperature, wind, and humidity affect ball flight and scoring.