Example hangman game

It seems like a simple enough question. Which word should you choose so that it takes your opponent the most guesses to discover it? Should you choose a long word to use up your opponent’s guesses? Or perhaps a short word with obscure letters? In this document I look into this question. But first, a bit of background.

If you’re not familiar with the rules of hangman, it is a guessing game played between two people. Player A chooses a secret word, and tells player B the length of the secret word. Player B guesses letters which she thinks might be in the word. If she chooses a correct letter, player A reveals the locations of each instance of the guessed letter. However, if player B guesses an incorrect letter, this counts as a “strike” against her. After an agreed-upon number of strikes, player B loses.

An algorithmic approach

A few years ago, I created a “hangman solver” for the popular paper and pencil game. This game assessed each game analytically, to determine a list of possible words given the clues available. The algorithm works as follows: at the beginning of the game, we know the length of the secret word, which narrows our dictionary considerably. Then, for each letter in the alphabet, count the number of words available which contain that letter.

Suppose our dictionary consisted of a random list of 50 four-letter words as follows:

["pull", "dipt", "dorp", "poky", "jism", "cues", "hood", "drag",
"inky", "mhos", "kerf", "jess", "mete", "lues", "wipe", "kane",
"tiro", "keys", "jape", "lime", "sees", "sass", "demo", "ilia",
"mink", "dips", "hove", "jees", "that", "pops", "isle", "teas",
"dens", "dogy", "pink", "sizy", "cole", "pact", "thaw", "lead",
"mile", "dodo", "litu", "scup", "colt", "soma", "seat", "dewy",
"pits", "mojo"]

This would result in letter counts as follows:

letter count
a 16
b 7
c 15
d 5
e 26
f 1
g 5
h 3
i 10
j 1
k 6
l 14
m 6
n 6
o 14
p 7
r 8
s 15
t 13
u 4
w 1
x 1
y 4

In this case, the letter E is found in 26 words, the most of any letter. Therefore our algorithm should pick E since it is the safest guess. Once we guess a letter correctly, this gives important positional information which can filter the word list even further.

This process is repeated, each time picking the most likely letter, given the constraints we know. If we know some of the letters in the secret word, we can eliminate any words that don’t have those letters in those positions. If we have guessed a letter incorrectly, we know that letter isn’t in the secret word and can eliminate all words which have that letter. You can see a python implementation of this algorithm here.

For this experiment I used the Zynga dictionary, the same dictionary used in the game Words With Friends. The dictionary contains 173,000 words. It does not include any proper nouns or profanities.

Live experiment

Below you can experiment with this algorithm among 4 letter words. Enter a secret word, and the steps used to uncover the secret word will be displayed below. (Need inspiration? Try comparing jazz to rock or blue to grey.)

Notice how some words have much higher difficulty than others? This is due to the fact that some words have many “siblings” which differ by only one letter. For example the pattern “.ays” could match the letters c, d, f, h, j, k, l, n, or r. Knowing the last 3 letters gives us no indication of which of these nine letters it will be.

One apparent shortcoming of the above calculation is that it assumes that letters with equal probability will be picked in alphabetical order, and therefore letters last in the alphabet will be picked last. Although this has the benefit of creating a deterministic algorithm which will always return the same result for the same word, in real life we don’t know in which order people will pick letters. In actuality the words days, jays and rays have equal difficulty each is equally likely to be the secret word.

Preliminary results

With this preliminary caveat in mind, we can still calculate the difficulty of every word in the dictionary. If we assume that in a situation where multiple letters are equally probable, our opponent will break the tie using alphabetical ordering, which hangman words are the hardest to guess? Here are the top 17:

word difficulty
zill 19
zills 18
yill 18
zin 17
zax 17
yills 17
will 17
vox 17
mem 17
zins 16
yuck 16
yin 16
wills 16
vill 16
oak 16
jazz 16
foy 16
(27 more) 15

Previous research

I should note that previous research by Jon McLoone in 2010 has explored the same topic, although he used slightly different methodology and a smaller 90,000 word dictionary. His algorithm was not deterministic, and so does not always pick the most frequent letter available. Instead, his algorithm picks a letter with a probability proportional to the frequency with which it occurs in candidate words. For example, if we refer to the letter frequencies of the 50 word 4-letter dictionary above, j appears in just 1 word, while e appears in 26 of them. In this case, since the sum of the numbers in the frequency table is 188, McLoone’s algorithm would pick z in 1 out of every 188 first guesses, and e in 26 of them. Although it might seem that this strategy is not optimal, it does avoid the deterministic results shown above.

Additionally, McLoone chose to remain faithful to the logic of the hangman game, opting to end the games after a given number of mistakes, and recording the probability that a given word was not discovered after the game ended. So an 8-game means that the game was ended after 8 mistakes, and a 13-game after 13 mistakes. Using this methodology, he found the hardest hangman words were as follows:

Jon McLoone results

Future research

Now, I think that we can improve upon our results a bit. Rather than calculating difficulty deterministically, we can instead randomize the ordering which letters will be picked in. This should dramatically reduce some of the outliers from above, bringing “rays” down from a difficulty of 14 to something more reasonable.

Such a stochastic calculation will require simulating millions of games. For my 173,000 word dictionary, simply simulating 10 games would involve 1.7 million games. Fortunately, this operation is highly parallelizable. It should be possible to split the dictionary into 100 or even 1000 pieces and derive the results for each piece simultaneously.

A simulation of this algorithm is shown below, with a graph of the average number of mistakes the new randomized algorithm incurs before discovering your secret word. The simulation is set up to play 500 rounds of games, and the final average is displayed at the top.