Solving Wordle

2022-01-30

What is Wordle?
How does it work?
Initial Solution
- Working out \beta(\kappa, W)
- Working out l(w_{1}, w_{2}, \kappa, W)
Computers are slow :(
- Time for Python
- \beta and l
What's Next?

What is Wordle?

Wordle is a web game that's recently become super popular, and I've seen a lot of articles on the internet about how to "solve" it, but so far all the ones I've seen have started with too many assumptions to be satisfying to me. So I'm writing this, about finding the ONE TRUE SOLUTION. Then we can make assumptions after that 😉.

How does it work?

Wordle's rules are very similar to the board game mastermind, but instead of having guess an absolutely random code the program selects a random word $a$ from a list of words $W$ . Our goal is to find this word. To narrow our choices down, we can input a word $w$ , from the set $W$ , and Wordle will tell us how well it matches the answer $a$ . We have six chances to guess the word correctly until we've failed and properly embarrassed ourselves.

The feedback that Wordle gives us works like this:

For every letter in $w$ which is not in $a$ , mark with a grey colour ⚪.
For every letter in $w$ which is in the same place as it is in $a$ , mark with a green colour 🟢.
Every letter left without a colour in $w$ will now be in the word $a$ , just not with the correct position. Mark these yellow 🟡, except for repeated letters of which the first $n$ will be marked yellow and the rest grey, where $n$ is the number of times that letter occurs in $a$ , and hasn't been coloured already.

For example, with $w=$ "goose" and $a$ ="obese", the colouring will be:

⚪🟡⚪🟢🟢

This tells us that: the "s" and "e" are in the correct positions, there is no "g" in $a$ , and there is only one "o" in $a$ which isn't in the second or third position.

Initial Solution

We want to come up with a solution which minimises the amount of guesses needed to solve for $a$ . We'll some things:

$\kappa$ : this is our current knowledge of the answer, coming from the wordle responses 🟢🟡⚪. $\kappa$ is additive, so two bits of knowledge coming from two separate responses can be combined.
$O(w)$ : this is the wordle oracle. It returns a bit of knowledge about the answer, given a word $w$ .
$\beta(\kappa, W)$ : This function returns the best word, given the current knowledge $\kappa$ and a set of words to guess from $W$ .
$l(w_{1}, w_{2}, \kappa, W)$ : This is the length of the path from word $w_{1}$ to word $w_{2}$ , given the current knowledge $\kappa$ and our set of words $W$ . This is the number of guesses you'd need to make, given our current knowledge to get $w_{2}$ starting from $w_{1}$ .

Working out $\beta(\kappa, W)$

What is the best word to guess? The one which minimises our average number of guesses to an answer.

Mathematically, we can write this as:

\beta(\kappa, W) = \min_{w\in W} E\left[l(w) | \kappa, W\right]

Where $E\left[l(w) | \kappa, W\right]$ is the average number of guesses from $w$ to an answer, given the current knowledge and word list. The average of a set of values is the sum of (value $\times$ probability that value occurs), so expanding this, we can write:

E\left[l(w) | \kappa, W\right] = \sum_{a \in A(\kappa)}p(a)l(w, a, \kappa, W)

Where $A(\kappa)$ is the set of possible answers, given the current knowledge, $p(a)$ is the probability of that answer having been picked (this is just a constant - we can assume that all answers have an equal chance of being picked), and $l$ is the guess number function we declared before. Lets flesh $l$ out.

Working out $l(w_{1}, w_{2}, \kappa, W)$

Starting out with the obvious - the number of guesses to get from a word to itself will be 1. So:

l(w, w, \kappa, W) = 1

If $w_{1}$ and $w_{2}$ aren't equal, then we should instead update our information using the wordle oracle, $\kappa' = \kappa + O(w)$ and work out the length from the new best word, given this updated knowledge $\kappa'$ :

l(w_{1}, w_{2}, \kappa, W) = 1 + l(\beta(\kappa', W), w_{2}, \kappa', W)

And that's it! Using the two mutually recursive functions $\beta$ and $l$ , we can calculate the best word to use on each step. For example: the best starting word is given by $\beta(\kappa_{0}, W)$ , where $\kappa_{0}$ is the "empty knowledge" (we have no information yet).

Taking stock, we have the two functions:

\beta(\kappa, W) = \min_{w \in W}\sum_{a \in A(\kappa)}p(a)l(w, a, \kappa, W) \\ l(w_{1}, w_{2}, \kappa, W) = \left\{\begin{array}{lr} 1& \text{ if } w_{1}=w_{2}\\ 1 + l(\beta(\kappa + O(w_{1}), W), w_{2}, \kappa + O(w_{1}), W)& \text{ otherwise} \end{array}\right.

We've worked it out! Easy! We're done!?

Computers are slow :(

Yeah, right - try calculating those functions. If we only have a set of $100$ words, by the sixth guess you have $100^6=1e12$ elements to calculate for. And if we're thinking about the amount of 5-letter words there are, you're doing $158,390^{6}$ according to Free Dictionary…

We'll think of some ways to optimise the calculation in the next post, but first lets write down an implementation in python.

Time for Python

We'll create some classes to hold our knowledge first:

def merge(A: dict[K, V], B: dict[K, V], f: Callable[[V, V], V]) -> dict[K, V]:
    """Function to merge two dictionaries, A and B.
    If any keys are in both, apply f(a, b) to the values and put the
    result of this function under the key
    """
    merged = {k: A.get(k, B.get(k)) for k in A.keys() ^ B.keys()}
    merged.update({k: f(A[k], B[k]) for k in A.keys() & B.keys()})
    return merged


@dataclass
class Unplaced:
    """Function to describe an unplaced entry.
    If complete is true, then we know the amount of this letter that the
    answer must contain, otherwise we only know that the answer must contain
    at least `number` of this letter.
    The character must not be in any position in positions
    """

    complete: bool
    number: int
    positions: set[int]

    def __add__(self, other: "Unplaced") -> "Unplaced":
        return Unplaced(
            self.complete or other.complete,
            max(self.number, other.number),
            self.positions | other.positions,
        )


@dataclass
class Knowledge:
    """Class to describe the set of knowledge about the answer
    `not_contained` contains the letters not contained in the answer
    `placed` contains the set of letters and their positions in the answer
    `unplaced` contains the letters which we don't know the positions of -
    only the positions they aren't in, and the number there are
    """

    not_contained: set[str] = field(default_factory=set)
    placed: set[tuple[int, str]] = field(default_factory=set)
    unplaced: dict[str, Unplaced] = field(default_factory=dict)

    def __add__(self, other: "Knowledge") -> "Knowledge":
        return Knowledge(
            not_contained=self.not_contained | other.not_contained,
            placed=self.placed | other.placed,
            unplaced=merge(self.unplaced, other.unplaced, lambda a, b: a + b),
        )

    def complete(self) -> bool:
        """Return true if we have complete knowledge of the word
        """
        return len(self.placed) == 5

    def contains(self, word: str) -> bool:
        """Return true if the word fits the knowledge
        False otherwise
        """
        if any(i in word for i in self.not_contained):
            return False
        if any(word[i] != c for (i, c) in self.placed):
            return False
        for (c, unplaced) in self.unplaced.items():
            count = 0
            for i, l in enumerate(word):
                if l == c:
                    if i in unplaced.positions:
                        return False
                    else:
                        count += 1
            if unplaced.complete and count != unplaced.number:
                return False
            elif (not unplaced.complete) and unplaced.number > count:
                return False
        return True

The `Unplaced` class gives us a way to store knowledge about the positions of yellow coloured results. Our `Knowledge` class has two methods: `complete` tells us if we have all of the positions, and `contains` tells us if a given word fits with the current knowledge. We've also implemented the `add` operator, so we're able to add together two different bits of knowledge.

Now lets create the oracle function:

def oracle(word: str, answer: str) -> Knowledge:
    """Return the wordle results for the given word and answer"""
    answer_list: list[str | None] = list(answer)
    placed: set[tuple[int, str]] = set()
    others: dict[str, list[bool | set[int]]] = defaultdict(
        lambda: [False, set()]
    )
    for (i, (w, a)) in enumerate(zip(word, answer)):
        if w == a:
            placed.add((i, w))
        else:
            for j, c in enumerate(answer_list):
                if c == w:
                    others[c][1].add(i)
                    answer_list[j] = None
                    break
            else:
                others[w][0] = True
    not_contained: set[str] = set()
    unplaced: dict[str, Unplaced] = dict()
    for (w, (b, l)) in others.items():
        if len(l) == 0:
            not_contained.add(w)
        else:
            unplaced[w] = Unplaced(b, len(l), l)

    return Knowledge(
        not_contained=not_contained, placed=placed, unplaced=unplaced
    )

Lets test out our classes:

print(oracle("goose", "obese"))

Knowledge(not_contained={'g'}, placed={(3, 's'), (4, 'e')}, unplaced={'o': Unplaced(complete=True, number=1, positions={1})})

So we see that our knowledge includes the fact that "g" isn't contained in the answer, "s" and "e" are both placed, and "o" only occurs in one place, and that place isn't position 1.

Lets try combining two different bits of knowledge:

print(oracle("goose", "obese") + oracle("south", "obese"))

Knowledge(not_contained={'h', 'g', 't', 'u'}, placed={(3, 's'), (4, 'e')}, unplaced={'s': Unplaced(complete=False, number=1, positions={0}), 'o': Unplaced(complete=True, number=1, positions={1})})

Cool - looks like it's working. Just a little sanity check that knowledge is idempotent (the same bit of knowledge added to itself doesn't change the knowledge that you have):

k = oracle("goose", "obese")
print(k + k == k)

True

Nice.

$\beta$ and $l$

One important restriction we need to add here is the path we've taken to get to our current guess. If we don't add this in, then our functions just go on to calculate for an infinite number of turns and never terminate. Our functions will have an extra parameter, `answers` which is the current turn we're on. The functions will return either their value, or None depending on whether the number of answers has become higher than the turn limit or not.

Lets define $l$ first:

def l(w1: str, w2: str, k: Knowledge, W: list[str], guesses: list[str], answers: list[str]) -> Optional[int]:
    """Find the number of guesses needed to get from w1 to w2"""
    if w1 == w2:
        return 1
    else:
        updated_knowledge = oracle(w1, w2) + k
        if (best_guess := b(updated_knowledge, W, guesses, answers)) is not None and (
            next_length := l(best_guess, w2, updated_knowledge, W, guesses, answers)
        ):

            return 1 + next_length
        else:
            return None

And $\beta$ :

def average_length(
    w: str, answers: list[str], k: Knowledge, W: list[str], guesses: list[str]
) -> Optional[int]:
    total_distance = 0
    for a in answers:
        if (distance := l(w, a, k, W, guesses, answers)) is not None:
            total_distance += distance
        else:
            return None

    return total_distance


def b(
    k: Knowledge,
    W: list[str],
    guesses: list[str] = None,
    answers: list[str] = None,
) -> Optional[str]:
    """Find the best guess out of the collection of words W"""
    if guesses is None:
        guesses = []
    elif len(guesses) > 6:
        # turn limit has been reached - there's no best word here
        return None
    if answers is None:
        answers = [w for w in W if k.contains(w)]
    else:
        answers = [w for w in answers if k.contains(w)]
    if len(answers) == 1:
        return answers[0]
    min_vals: Optional[tuple[str, int]] = None
    for w in W:
        if (
            w not in guesses
            and (al := average_length(w, answers, k, W, guesses + [w]))
            is not None
        ):
            if min_vals is None or al < min_vals[1]:
                min_vals = (w, al)
    if min_vals is None:
        return None
    else:
        return min_vals[0]

Lets test this out with a wee example:

print(
    b(
        Knowledge(),
        [
            "goose",
            "geese",
            "obese",
            "poops",
            "smoke",
            "boots",
            "vomit",
            "great",
            "zoink",
            "plonk",
            "snubs",
        ],
    )
)

boots

And we can build the tree of best guesses for every possible answer picked:

def build_tree():
    words = [
        "goose",
        "geese",
        "obese",
        "boots",
        "vomit",
        "great",
        "zoink",
        "plonk",
        "snubs",
        "apart",
        "spark",
        "words",
        "blurb"
    ]
    start = b(Knowledge(), words)
    paths = []
    for a in words:
        path = [start]
        knowledge = oracle(start, a)
        while path[-1] != a:
            new_word = b(knowledge, words)
            path.append(new_word)
            knowledge = knowledge + oracle(new_word, a)
        paths.append((a, path))
    return paths

build_tree()

goose	(obese goose)
geese	(obese geese)
obese	(obese)
boots	(obese boots)
vomit	(obese vomit)
great	(obese great)
zoink	(obese vomit zoink)
plonk	(obese vomit plonk)
snubs	(obese snubs)
apart	(obese apart)
spark	(obese spark)
words	(obese words)
blurb	(obese blurb)

Here, the left column shows the answer, and the right column shows the path of guesses to get that word. For this set of words, following this strategy - there's a maximum of 3 guesses needed.

The code from this post is available in a github gist if you're wanting to play around with it.

What's Next?

So, at the moment we've got our little function which can solve the problem for about 10 words. Unfortunately this waaaay too slow for the bigger problem of all 5 letter words, so in the next post I'll look at ways of optimising our function (maybe making some ..assumptions..🤢) so that we can tackle the larger problem.