Education February 19, 2026

The Markov Chain Explained With a Single At-Bat

Our model is a 25-state Markov chain. That sounds intimidating. It's actually just baseball, written as math.

What's a State?

At any point during an inning, baseball has a very specific situation: some combination of runners on base and a number of outs. That situation is a state.

There are exactly 24 possible states during an active inning:

8 base-runner combinations:

Bases empty Runner on 1st Runner on 2nd Runner on 3rd 1st & 2nd 1st & 3rd 2nd & 3rd Bases loaded

× 3 out counts: 0 outs, 1 out, 2 outs

+ 1 absorbing state: 3 outs (inning over)

= 25 total states

That's the entire universe of baseball situations within an inning. Every at-bat starts in one of these states and ends in another.

One At-Bat, Step by Step

Let's walk through it. Top of the 3rd inning. Nobody on, nobody out.

STATE Bases empty, 0 outs

This is state 1 of our 24 active states. The batter steps in. What happens next?

TRANSITION The model checks probabilities

For this specific batter vs. this specific pitcher, our model has a 25×25 transition probability matrix — built from their combined at-bat history across 1.7 million at-bats in our database.

From "bases empty, 0 outs," the possible outcomes include:

Strikeout/flyout/groundout → Empty, 1 out~65%
Single → Runner on 1st, 0 outs~17%
Double → Runner on 2nd, 0 outs~5%
Walk → Runner on 1st, 0 outs~8%
Home run → Empty, 0 outs (+1 run)~3%
Error/other → various~2%
NEW STATE Runner on 2nd, 0 outs

Let's say the batter doubles. We've moved from state 1 to state 3. The next batter steps in, and the process repeats from the new state.

Why "Markov"?

The key property of a Markov chain is memorylessness. The next state depends only on the current state — not on how you got there.

Runner on 2nd with 0 outs? The model doesn't care if that runner doubled, singled and advanced on a groundout, or walked and stole second. The current state is all that matters for predicting what happens next.

This is a simplification, obviously. In real baseball, a pitcher who just gave up three straight hits might be rattled. But mathematically, this simplification lets us model the game with clean, tractable probabilities — and 1.7 million at-bats of training data.

The Full Picture: One Half-Inning

An inning is just a chain of these transitions until we reach the absorbing state (3 outs). Here's how a simulated half-inning might flow:

Batter 1: Empty/0 outSingle1st/0 out

Batter 2: 1st/0 outGroundout (DP)Empty/2 out

Batter 3: Empty/2 outHome runEmpty/2 out (+1 run)

Batter 4: Empty/2 outFlyout3 outs ■

Inning result: 1 run

Chain 9 of these half-innings together for each team, and you've simulated one full game. Do that 10,000 times, and you have a complete probability distribution.

Where the Data Comes From

Every batter and pitcher in our model has their own transition matrix, built from real play-by-play data:

Total At-Bats

1.7 million

Seasons

2016–2025

Unique Batters

2,513

Unique Pitchers

2,450

All sourced from Retrosheet — the gold standard of historical play-by-play data. Every pitch, every at-bat, every state transition, meticulously recorded by volunteers since the 1980s.

When a batter faces a pitcher, we blend their individual matrices to create a matchup-specific transition matrix. That matrix captures: how likely is this batter to hit a single off this pitcher? A double? A strikeout? A walk? All 625 cells of the 25×25 matrix are filled with real probabilities from real at-bats.

The Beautiful Constraint

Here's what makes the Markov model elegant: it doesn't need to understand how baseball works. It doesn't know about launch angles, spin rates, or defensive shifts. It just knows that when this batter faces this pitcher in this state, these are the probabilities of landing in each new state.

The physics, the strategy, the matchups — they're all encoded implicitly in the transition probabilities, extracted from millions of real outcomes.

The model is simple. The data is rich. That's the whole idea.