The Math Behind Wordle: Information Theory and Optimal Play
Explore the fascinating mathematics behind Wordle, from information theory to entropy calculations. Understand why certain words are mathematically superior.
Dwayne K. Richardson is a Wordle enthusiast and puzzle analyst who has been playing Wordle since January 2022. With a current streak of 340+ days, Dwayne combines statistical analysis with practical gameplay experience to help players improve their Wordle skills. He is the author of all blog posts on Wordle Analyzer.
Wordle Is a Math Problem Disguised as a Word Game
Every time you make a guess in Wordle, you are running a calculation. You might not realize it — most people do not think about information theory while typing five-letter words over morning coffee — but the game is fundamentally about minimizing uncertainty. Each guess takes a pool of possible answers and splits it into smaller groups based on the color pattern. The better your guess, the more evenly it splits the pool, and the faster you converge on the answer. This is not a metaphor. It is literally what is happening, and the math that describes it is both elegant and surprisingly practical.
I have been fascinated by the math behind Wordle since I first read a paper from MIT researchers who formally analyzed optimal play. You do not need to understand the math to play well, but understanding the principles changed how I think about each guess — not because I am computing entropy at the keyboard, but because the framework gives me better intuitions about what makes a guess good or bad. This article explains those principles without requiring a background in mathematics.
Wordle as an Information Theory Problem
Information theory, developed by Claude Shannon in the 1940s, quantifies uncertainty. The core unit is the bit — one bit cuts possibilities roughly in half. Eight bits cuts them by a factor of 256. In Wordle, you start with about 2,309 possible answers. How many bits do you need to identify one specific word? The base-2 logarithm of 2309 is roughly 11.2 bits. So if every guess gave maximum information, you could always solve Wordle in about 3 guesses (3 guesses can encode up to 15 bits with 243 possible color patterns each).
In practice, guesses do not give maximum information because color patterns are not equally likely. A guess like CRANE will produce diverse patterns — some all-gray, some with greens and yellows. A guess like XYLYL (a real word, a chemical group) will almost always produce all-gray because X and Y rarely appear. Both give you some information, but CRANE gives substantially more on average. The difference is not subtle — it is the difference between having 78 candidate words remaining and 430.
The fundamental insight: Wordle is not a word game. It is a search problem where you are trying to locate one specific item in a set of 2,309 using as few queries as possible. The words are the search space, the colors are the feedback, and entropy measures how much each query narrows the search. Understanding this transforms how you think about every guess.
What Entropy Means in Wordle Context
Entropy measures how much a guess reduces uncertainty on average. It is calculated by looking at all possible color patterns a guess can produce, determining what fraction of answers produce each pattern, and computing expected information gain. A guess splitting answers into 100 equally-sized groups has higher entropy than one where 90% of answers produce the same pattern and the remaining 10% are scattered across 99 others. The first resolves uncertainty reliably. The second usually tells you little, with a rare chance of being very informative.
The highest-entropy openers are SALET and SLATE, both producing roughly 5.8 bits of entropy. That means the first guess, on average, cuts possible answers by a factor of about 55 (2 to the power of 5.8). Starting from 2,309, you are down to roughly 42 on average. A poor opener like OUIJA produces only about 4.2 bits, cutting the pool by roughly 18 — leaving you with about 128 possible answers. Same number of guesses used, wildly different positions on the board.
How Many Bits Each Guess Provides
Not all guesses are created equal. The table below shows the approximate entropy for different categories of openers, along with the practical impact on remaining candidate count. The gap between the best and worst openers is staggering — nearly 2.2 bits, which translates to roughly 4.6x more remaining words after a poor opener compared to an optimal one.
| Opener Category | Examples | Entropy (bits) | Avg Remaining | Pool Reduction |
|---|---|---|---|---|
| Optimal | SALET, SLATE | ~5.8 | ~70 | 97% |
| Good | CRANE, TRACE, RAISE | ~5.5-5.7 | ~78-86 | 96-97% |
| Vowel-heavy | ADIEU, AUDIO | ~4.8-5.0 | ~119-135 | 94-95% |
| Mediocre | QUICK, JUMBO | ~4.0-4.5 | ~200-300 | 87-91% |
| Poor | OUIJA, XYLYL | ~3.5-4.2 | ~210-430 | 81-91% |
Second guesses typically provide 3 to 5 bits. By guess three, you are often in the 1 to 3 bit range because remaining uncertainty is small and it is harder to split a small pool evenly. The theoretical minimum for solving in 4 guesses is about 11.2 bits. Optimal play extracts roughly 16.3 bits over four guesses, so there is substantial slack — which is why most strategic players solve in 4 on average even though they are not playing optimally.
Above: SLATE producing a typical pattern — 2 greens, 1 yellow, 2 grays. This particular pattern provides roughly 5.6 bits of information, cutting the candidate pool from 2,309 down to a manageable set. The math works even if you never calculate a single bit.
Why Some Words Are Worth More as Guesses
A guess's value depends on two things: which letters it tests and where they are positioned. Testing common letters is more valuable because you are more likely to get non-gray feedback, and non-gray feedback splits the possibility space. Position matters too. Testing E at the end is more informative than E at the beginning — E is much more common at the end of five-letter words.
SALET (S-A-L-E-T) and SLATE (S-L-A-T-E) test the same five letters, but SALET places E at position 4 and T at position 5, while SLATE places E at position 5 and T at position 4. The positional frequency differences give SALET a tiny edge in entropy — about 0.01 bits. In practice, this difference is invisible. You would need millions of games for it to show up in stats. But the math is clear: SALET is technically the optimal first guess by expected information gain.
The Calculation: Expected Remaining Words
For each possible color pattern a guess can produce, count how many answers would produce that pattern. Multiply by the probability of that pattern (count divided by total answers). Sum across all patterns. This gives you the expected pool size after one guess. Lower is better. SALET: roughly 70.8 expected remaining words. CRANE: about 78.4. ADIEU: about 119. OUIJA: about 213.
These numbers are not linear with entropy because entropy and expected pool size measure different things. Entropy measures how evenly the pool splits. Expected pool size measures average remaining candidates regardless of distribution. They are correlated but not identical. A guess that always leaves exactly 70 candidates has the same expected pool size as one that usually leaves 10 but occasionally leaves 400 — but the first is far more reliable, and entropy captures that difference.
Practical takeaway: You do not need to calculate entropy to use these principles. Just ask two questions before every guess: "Which new letters am I testing?" and "How many possible answers does this help me distinguish?" If the answer to the first is "none," you are wasting a guess. If the answer to the second is "one," you had better have enough guesses left for the alternatives.
Why CRANE Beats XYLYL on Average
XYLYL contains X and Y, two of the least common letters in the answer pool. The most likely outcome is all five gray — eliminating maybe 30 to 40 percent of answers. Not useless, but not great. CRANE contains five of the most common letters. The most likely outcome is a mix of grays, yellows, and maybe a green. The diverse pattern variety means CRANE splits the answer pool into more, smaller groups — exactly what you want.
Expected remaining words after XYLYL: roughly 430. After CRANE: 78. Same guess slot, five times the elimination power. That is the difference between testing common letters and uncommon ones, and it compounds over the course of a game. A player who opens with CRANE and follows up intelligently will typically solve in 3-4 guesses. A player who opens with XYLYL will typically need 5-6.
The Optimal First Guess Depends on Your Metric
There is no single "best" opener — it depends on what you are optimizing for. Different mathematical objectives produce different optimal words, and understanding the distinction helps you choose the approach that matches your goals.
| Optimization Goal | Best Opener | Why | Best For |
|---|---|---|---|
| Minimum average remaining words | SALET | Lowest expected pool size (~70.8) | Players optimizing average guess count |
| Minimum worst-case remaining words | SLATE | Fewest words in the worst pattern | Players who hate bad surprises |
| Minimax (solve in fewest worst-case guesses) | SALET | Guarantees solve in at most 5 guesses | Streak players who want safety bounds |
| Maximum entropy (information gain) | SALET | Highest expected bits (~5.8) | Theory enthusiasts |
The differences are marginal. Any top-10 opener is within a few percentage points of optimal. The math matters more for understanding why some words work better than for choosing between SALET and SLATE. If you are using a top-5 opener, you are already extracting nearly all the available information from your first guess.
Why the Best Guess Is Not Always a Possible Answer
The answer pool (roughly 2,309 words) is a subset of valid guesses (roughly 12,000+). Words like TARSE, LARES, or AESIR are valid guesses that will never be answers. Sometimes a non-answer word splits remaining candidates more evenly than any answer word, because it can use letter combinations absent from the answer pool. This matters most in the late game with few candidates.
In normal mode, you can probe freely with non-answer words. In Hard Mode, they must still include all green and yellow letters, limiting their usefulness. The ability to use non-answer probe words is one of the most underappreciated advantages of normal mode — it gives you access to roughly 10,000 additional test words that can be strategically valuable when the remaining candidates share many letters.
How AI Solvers Work: Minimax vs Expected Value
Two main approaches, optimizing for different goals. Understanding these approaches does not just explain how bots play — it reveals the fundamental strategic tension in Wordle between optimizing your average performance and protecting your worst case.
Expected Value (Entropy Maximization)
Minimizes the average remaining words after each guess. Over many games, this minimizes average guesses to solve. SALET is the optimal first guess under this metric. Think of it as the "best on average" approach — it accepts occasional bad days in exchange for consistently strong average performance. Best for players who care about their guess distribution chart.
Minimax (Worst-Case Optimization)
Minimizes the worst-case outcome — asking "what is the biggest pool I could face after this guess." This guarantees solving any answer within a fixed number of guesses (5 for optimal play). It sacrifices average-case performance to ensure you never need more than 5 guesses. Best for streak players who cannot afford a single loss.
Neither is "better" — they optimize for different goals. Streak players should prefer minimax (bounds worst case). Average-seekers should prefer expected value. In my play, I use simplified expected value for the first two guesses and switch to minimax thinking for guesses 3 through 6. Not optimal, but practical — and it reflects the reality that early guesses benefit from maximizing information while late guesses benefit from guaranteeing solutions.
Fun fact: An optimal minimax solver can guarantee solving every Wordle puzzle in at most 5 guesses. Not 6 — 5. That means with perfect play, you never need the sixth row. The gap between this theoretical bound and most players' actual performance (3.7-4.2 average) shows how much room there is between "good enough" and "optimal."
My Simplified Approach: Using Math Without a Calculator
I do not compute entropy at the keyboard. But I do use the principles the math reveals. Before typing any guess, I ask two questions: which untested letters am I testing? And how many possible answers does this help me distinguish? If the answer to the first is "none," I am wasting a guess. If the answer to the second is "one," I had better have enough guesses left for the alternatives.
No spreadsheets, no entropy calculations. Two questions that encapsulate the core insight of information theory: a good guess reduces uncertainty, and the best guess reduces it the most evenly. The math is elegant and worth understanding. But the daily puzzle is played by humans, not algorithms. Use the math to choose a good opener and understand why some guesses feel productive. Then close the spreadsheet and play the game.
✅ Key Takeaways
- Wordle is fundamentally a search problem — you are locating one word in 2,309 using as few queries as possible
- Each guess provides roughly 3.5 to 5.8 bits of information depending on letter choice and positioning
- SALET is the optimal opener by expected information gain, producing ~5.8 bits and leaving ~70 remaining words
- Common letters in common positions produce more entropy because they split the answer pool more evenly
- The gap between optimal and poor openers is massive: SALET leaves ~70 words vs OUIJA's ~210
- Expected value optimization minimizes average guesses; minimax optimization minimizes worst-case guesses
- You do not need to calculate entropy — just test new common letters and maximize the candidates you can distinguish