image of me

Solving Wordle

2022-01-30

What is Wordle?

Wordle is a web game that's recently become super popular, and I've seen a lot of articles on the internet about how to "solve" it, but so far all the ones I've seen have started with too many assumptions to be satisfying to me. So I'm writing this, about finding the ONE TRUE SOLUTION. Then we can make assumptions after that 😉.

How does it work?

Wordle's rules are very similar to the board game mastermind, but instead of having guess an absolutely random code the program selects a random word aa from a list of words WW. Our goal is to find this word. To narrow our choices down, we can input a word ww, from the set WW, and Wordle will tell us how well it matches the answer aa. We have six chances to guess the word correctly until we've failed and properly embarrassed ourselves.

The feedback that Wordle gives us works like this:

  1. For every letter in ww which is not in aa, mark with a grey colour ⚪.
  2. For every letter in ww which is in the same place as it is in aa, mark with a green colour 🟢.
  3. Every letter left without a colour in ww will now be in the word aa, just not with the correct position. Mark these yellow 🟡, except for repeated letters of which the first nn will be marked yellow and the rest grey, where nn is the number of times that letter occurs in aa, and hasn't been coloured already.

For example, with w=w="goose" and aa="obese", the colouring will be:

⚪🟡⚪🟢🟢

This tells us that: the "s" and "e" are in the correct positions, there is no "g" in aa, and there is only one "o" in aa which isn't in the second or third position.

Initial Solution

We want to come up with a solution which minimises the amount of guesses needed to solve for aa. We'll some things:

  • κ\kappa: this is our current knowledge of the answer, coming from the wordle responses 🟢🟡⚪. κ\kappa is additive, so two bits of knowledge coming from two separate responses can be combined.
  • O(w)O(w): this is the wordle oracle. It returns a bit of knowledge about the answer, given a word ww.
  • β(κ,W)\beta(\kappa, W): This function returns the best word, given the current knowledge κ\kappa and a set of words to guess from WW.
  • l(w1,w2,κ,W)l(w_{1}, w_{2}, \kappa, W): This is the length of the path from word w1w_{1} to word w2w_{2}, given the current knowledge κ\kappa and our set of words WW. This is the number of guesses you'd need to make, given our current knowledge to get w2w_{2} starting from w1w_{1}.

Working out β(κ,W)\beta(\kappa, W)

What is the best word to guess? The one which minimises our average number of guesses to an answer.

Mathematically, we can write this as:

β(κ,W)=minwWE[l(w)κ,W]\beta(\kappa, W) = \min_{w\in W} E\left[l(w) | \kappa, W\right]

Where E[l(w)κ,W]E\left[l(w) | \kappa, W\right] is the average number of guesses from ww to an answer, given the current knowledge and word list. The average of a set of values is the sum of (value ×\times probability that value occurs), so expanding this, we can write:

E[l(w)κ,W]=aA(κ)p(a)l(w,a,κ,W)E\left[l(w) | \kappa, W\right] = \sum_{a \in A(\kappa)}p(a)l(w, a, \kappa, W)

Where A(κ)A(\kappa) is the set of possible answers, given the current knowledge, p(a)p(a) is the probability of that answer having been picked (this is just a constant - we can assume that all answers have an equal chance of being picked), and ll is the guess number function we declared before. Lets flesh ll out.

Working out l(w1,w2,κ,W)l(w_{1}, w_{2}, \kappa, W)

Starting out with the obvious - the number of guesses to get from a word to itself will be 1. So:

l(w,w,κ,W)=1l(w, w, \kappa, W) = 1

If w1w_{1} and w2w_{2} aren't equal, then we should instead update our information using the wordle oracle, κ=κ+O(w)\kappa' = \kappa + O(w) and work out the length from the new best word, given this updated knowledge κ\kappa':

l(w1,w2,κ,W)=1+l(β(κ,W),w2,κ,W)l(w_{1}, w_{2}, \kappa, W) = 1 + l(\beta(\kappa', W), w_{2}, \kappa', W)

And that's it! Using the two mutually recursive functions β\beta and ll, we can calculate the best word to use on each step. For example: the best starting word is given by β(κ0,W)\beta(\kappa_{0}, W), where κ0\kappa_{0} is the "empty knowledge" (we have no information yet).

Taking stock, we have the two functions:

β(κ,W)=minwWaA(κ)p(a)l(w,a,κ,W)l(w1,w2,κ,W)={1 if w1=w21+l(β(κ+O(w1),W),w2,κ+O(w1),W) otherwise\beta(\kappa, W) = \min_{w \in W}\sum_{a \in A(\kappa)}p(a)l(w, a, \kappa, W) \\ l(w_{1}, w_{2}, \kappa, W) = \left\{\begin{array}{lr} 1& \text{ if } w_{1}=w_{2}\\ 1 + l(\beta(\kappa + O(w_{1}), W), w_{2}, \kappa + O(w_{1}), W)& \text{ otherwise} \end{array}\right.

We've worked it out! Easy! We're done!?

Computers are slow :(

Yeah, right - try calculating those functions. If we only have a set of 100100 words, by the sixth guess you have 1006=1e12100^6=1e12 elements to calculate for. And if we're thinking about the amount of 5-letter words there are, you're doing 158,3906158,390^{6} according to Free Dictionary

We'll think of some ways to optimise the calculation in the next post, but first lets write down an implementation in python.

Time for Python

We'll create some classes to hold our knowledge first:

def merge(A: dict[K, V], B: dict[K, V], f: Callable[[V, V], V]) -> dict[K, V]:
    """Function to merge two dictionaries, A and B.
    If any keys are in both, apply f(a, b) to the values and put the
    result of this function under the key
    """
    merged = {k: A.get(k, B.get(k)) for k in A.keys() ^ B.keys()}
    merged.update({k: f(A[k], B[k]) for k in A.keys() & B.keys()})
    return merged


@dataclass
class Unplaced:
    """Function to describe an unplaced entry.
    If complete is true, then we know the amount of this letter that the
    answer must contain, otherwise we only know that the answer must contain
    at least `number` of this letter.
    The character must not be in any position in positions
    """

    complete: bool
    number: int
    positions: set[int]

    def __add__(self, other: "Unplaced") -> "Unplaced":
        return Unplaced(
            self.complete or other.complete,
            max(self.number, other.number),
            self.positions | other.positions,
        )


@dataclass
class Knowledge:
    """Class to describe the set of knowledge about the answer
    `not_contained` contains the letters not contained in the answer
    `placed` contains the set of letters and their positions in the answer
    `unplaced` contains the letters which we don't know the positions of -
    only the positions they aren't in, and the number there are
    """

    not_contained: set[str] = field(default_factory=set)
    placed: set[tuple[int, str]] = field(default_factory=set)
    unplaced: dict[str, Unplaced] = field(default_factory=dict)

    def __add__(self, other: "Knowledge") -> "Knowledge":
        return Knowledge(
            not_contained=self.not_contained | other.not_contained,
            placed=self.placed | other.placed,
            unplaced=merge(self.unplaced, other.unplaced, lambda a, b: a + b),
        )

    def complete(self) -> bool:
        """Return true if we have complete knowledge of the word
        """
        return len(self.placed) == 5

    def contains(self, word: str) -> bool:
        """Return true if the word fits the knowledge
        False otherwise
        """
        if any(i in word for i in self.not_contained):
            return False
        if any(word[i] != c for (i, c) in self.placed):
            return False
        for (c, unplaced) in self.unplaced.items():
            count = 0
            for i, l in enumerate(word):
                if l == c:
                    if i in unplaced.positions:
                        return False
                    else:
                        count += 1
            if unplaced.complete and count != unplaced.number:
                return False
            elif (not unplaced.complete) and unplaced.number > count:
                return False
        return True

The `Unplaced` class gives us a way to store knowledge about the positions of yellow coloured results. Our `Knowledge` class has two methods: `complete` tells us if we have all of the positions, and `contains` tells us if a given word fits with the current knowledge. We've also implemented the `add` operator, so we're able to add together two different bits of knowledge.

Now lets create the oracle function:

def oracle(word: str, answer: str) -> Knowledge:
    """Return the wordle results for the given word and answer"""
    answer_list: list[str | None] = list(answer)
    placed: set[tuple[int, str]] = set()
    others: dict[str, list[bool | set[int]]] = defaultdict(
        lambda: [False, set()]
    )
    for (i, (w, a)) in enumerate(zip(word, answer)):
        if w == a:
            placed.add((i, w))
        else:
            for j, c in enumerate(answer_list):
                if c == w:
                    others[c][1].add(i)
                    answer_list[j] = None
                    break
            else:
                others[w][0] = True
    not_contained: set[str] = set()
    unplaced: dict[str, Unplaced] = dict()
    for (w, (b, l)) in others.items():
        if len(l) == 0:
            not_contained.add(w)
        else:
            unplaced[w] = Unplaced(b, len(l), l)

    return Knowledge(
        not_contained=not_contained, placed=placed, unplaced=unplaced
    )

Lets test out our classes:

print(oracle("goose", "obese"))
Knowledge(not_contained={'g'}, placed={(3, 's'), (4, 'e')}, unplaced={'o': Unplaced(complete=True, number=1, positions={1})})

So we see that our knowledge includes the fact that "g" isn't contained in the answer, "s" and "e" are both placed, and "o" only occurs in one place, and that place isn't position 1.

Lets try combining two different bits of knowledge:

print(oracle("goose", "obese") + oracle("south", "obese"))
Knowledge(not_contained={'h', 'g', 't', 'u'}, placed={(3, 's'), (4, 'e')}, unplaced={'s': Unplaced(complete=False, number=1, positions={0}), 'o': Unplaced(complete=True, number=1, positions={1})})

Cool - looks like it's working. Just a little sanity check that knowledge is idempotent (the same bit of knowledge added to itself doesn't change the knowledge that you have):

k = oracle("goose", "obese")
print(k + k == k)
True

Nice.

β\beta and ll

One important restriction we need to add here is the path we've taken to get to our current guess. If we don't add this in, then our functions just go on to calculate for an infinite number of turns and never terminate. Our functions will have an extra parameter, `answers` which is the current turn we're on. The functions will return either their value, or None depending on whether the number of answers has become higher than the turn limit or not.

Lets define ll first:

def l(w1: str, w2: str, k: Knowledge, W: list[str], guesses: list[str], answers: list[str]) -> Optional[int]:
    """Find the number of guesses needed to get from w1 to w2"""
    if w1 == w2:
        return 1
    else:
        updated_knowledge = oracle(w1, w2) + k
        if (best_guess := b(updated_knowledge, W, guesses, answers)) is not None and (
            next_length := l(best_guess, w2, updated_knowledge, W, guesses, answers)
        ):

            return 1 + next_length
        else:
            return None

And β\beta:

def average_length(
    w: str, answers: list[str], k: Knowledge, W: list[str], guesses: list[str]
) -> Optional[int]:
    total_distance = 0
    for a in answers:
        if (distance := l(w, a, k, W, guesses, answers)) is not None:
            total_distance += distance
        else:
            return None

    return total_distance


def b(
    k: Knowledge,
    W: list[str],
    guesses: list[str] = None,
    answers: list[str] = None,
) -> Optional[str]:
    """Find the best guess out of the collection of words W"""
    if guesses is None:
        guesses = []
    elif len(guesses) > 6:
        # turn limit has been reached - there's no best word here
        return None
    if answers is None:
        answers = [w for w in W if k.contains(w)]
    else:
        answers = [w for w in answers if k.contains(w)]
    if len(answers) == 1:
        return answers[0]
    min_vals: Optional[tuple[str, int]] = None
    for w in W:
        if (
            w not in guesses
            and (al := average_length(w, answers, k, W, guesses + [w]))
            is not None
        ):
            if min_vals is None or al < min_vals[1]:
                min_vals = (w, al)
    if min_vals is None:
        return None
    else:
        return min_vals[0]

Lets test this out with a wee example:

print(
    b(
        Knowledge(),
        [
            "goose",
            "geese",
            "obese",
            "poops",
            "smoke",
            "boots",
            "vomit",
            "great",
            "zoink",
            "plonk",
            "snubs",
        ],
    )
)
boots

And we can build the tree of best guesses for every possible answer picked:

def build_tree():
    words = [
        "goose",
        "geese",
        "obese",
        "boots",
        "vomit",
        "great",
        "zoink",
        "plonk",
        "snubs",
        "apart",
        "spark",
        "words",
        "blurb"
    ]
    start = b(Knowledge(), words)
    paths = []
    for a in words:
        path = [start]
        knowledge = oracle(start, a)
        while path[-1] != a:
            new_word = b(knowledge, words)
            path.append(new_word)
            knowledge = knowledge + oracle(new_word, a)
        paths.append((a, path))
    return paths

build_tree()
goose(obese goose)
geese(obese geese)
obese(obese)
boots(obese boots)
vomit(obese vomit)
great(obese great)
zoink(obese vomit zoink)
plonk(obese vomit plonk)
snubs(obese snubs)
apart(obese apart)
spark(obese spark)
words(obese words)
blurb(obese blurb)

Here, the left column shows the answer, and the right column shows the path of guesses to get that word. For this set of words, following this strategy - there's a maximum of 3 guesses needed.

The code from this post is available in a github gist if you're wanting to play around with it.

What's Next?

So, at the moment we've got our little function which can solve the problem for about 10 words. Unfortunately this waaaay too slow for the bigger problem of all 5 letter words, so in the next post I'll look at ways of optimising our function (maybe making some ..assumptions..🤢) so that we can tackle the larger problem.