The Markov Chain Explained With a Single At-Bat
Our model is a 25-state Markov chain. That sounds intimidating. It's actually just baseball, written as math.
What's a State?
At any point during an inning, baseball has a very specific situation: some combination of runners on base and a number of outs. That situation is a state.
There are exactly 24 possible states during an active inning:
8 base-runner combinations:
× 3 out counts: 0 outs, 1 out, 2 outs
+ 1 absorbing state: 3 outs (inning over)
= 25 total states
That's the entire universe of baseball situations within an inning. Every at-bat starts in one of these states and ends in another.
One At-Bat, Step by Step
Let's walk through it. Top of the 3rd inning. Nobody on, nobody out.
This is state 1 of our 24 active states. The batter steps in. What happens next?
For this specific batter vs. this specific pitcher, our model has a 25×25 transition probability matrix — built from their combined at-bat history across 1.7 million at-bats in our database.
From "bases empty, 0 outs," the possible outcomes include:
Let's say the batter doubles. We've moved from state 1 to state 3. The next batter steps in, and the process repeats from the new state.
Why "Markov"?
The key property of a Markov chain is memorylessness. The next state depends only on the current state — not on how you got there.
Runner on 2nd with 0 outs? The model doesn't care if that runner doubled, singled and advanced on a groundout, or walked and stole second. The current state is all that matters for predicting what happens next.
This is a simplification, obviously. In real baseball, a pitcher who just gave up three straight hits might be rattled. But mathematically, this simplification lets us model the game with clean, tractable probabilities — and 1.7 million at-bats of training data.
The Full Picture: One Half-Inning
An inning is just a chain of these transitions until we reach the absorbing state (3 outs). Here's how a simulated half-inning might flow:
Batter 1: Empty/0 out → Single → 1st/0 out
Batter 2: 1st/0 out → Groundout (DP) → Empty/2 out
Batter 3: Empty/2 out → Home run → Empty/2 out (+1 run)
Batter 4: Empty/2 out → Flyout → 3 outs ■
Inning result: 1 run
Chain 9 of these half-innings together for each team, and you've simulated one full game. Do that 10,000 times, and you have a complete probability distribution.
Where the Data Comes From
Every batter and pitcher in our model has their own transition matrix, built from real play-by-play data:
Total At-Bats
1.7 million
Seasons
2016–2025
Unique Batters
2,513
Unique Pitchers
2,450
All sourced from Retrosheet — the gold standard of historical play-by-play data. Every pitch, every at-bat, every state transition, meticulously recorded by volunteers since the 1980s.
When a batter faces a pitcher, we blend their individual matrices to create a matchup-specific transition matrix. That matrix captures: how likely is this batter to hit a single off this pitcher? A double? A strikeout? A walk? All 625 cells of the 25×25 matrix are filled with real probabilities from real at-bats.
The Beautiful Constraint
Here's what makes the Markov model elegant: it doesn't need to understand how baseball works. It doesn't know about launch angles, spin rates, or defensive shifts. It just knows that when this batter faces this pitcher in this state, these are the probabilities of landing in each new state.
The physics, the strategy, the matchups — they're all encoded implicitly in the transition probabilities, extracted from millions of real outcomes.
The model is simple. The data is rich. That's the whole idea.