Finding Possible Solutions to Wordle
Fri 27 January 2023

Article image

I read Dr. Drang’s post about his Wordle script and found it an interesting approach to finding possible solutions to Wordle. Like many other people, I also wrote a script to help find possible matches to Wordle back when it became so popular about a year ago.

Dr. Drang first solved this using a sequence of two egrep commands which he described in Not cheating at Wordle. The approach then was to use the first egrep command to find all words excluding those with the absent characters (marked in gray) but also including known (green) characters in their proper location. The results were piped to a second egrep which then filtered that list for words that include any present characters without a known location (yellow).

His follow-up post replaced this with a Python script that followed a similar approach but borrowed a feature from John Gruber’s Perl solution to pass in the list of letters that must appear (yellow) with known invalid locations in the form a23 that is, the word must contain the letter a but not in position 2 or 3. From that, he builds a set of regular expressions that are applied to find matching words.

My script used a different approach in that you enter the results of each guess encoded in a way that represents the response Wordle produces. Each character in a Wordle response can be in one of three states: gray for a letter that is absent in the word, yellow for a letter that is present in the word but not in that location, and green for a letter that is in the correct location.

Unfortunately, there isn’t an easy way to represent each letter in three different states from the command line. Two states are easy, using uppercase and lowercase characters. For a third state I simply chose to follow correct letters with an = so a guess might look like

stA=rE

to indicate that s, tand rshould be absent (gray), eshould be present but not in the fifth position (yellow) and ashould be present and in the third position (green).

This allows you to just enter the guesses you’ve made so far without too much thinking.

Implementation

The full script is below. For input it accepts any number of guess “patterns” encoded as I described above. After reading in the dictionary of all possible words (valid guesses and solutions), the valid words are determined by successively eliminating words that are not valid.

"""
find possible solutions to wordle
"""
import argparse
import random
import re

help="""\
Enter one or more guess "patterns" using lowercase characters for
letters that must be absent in the word (black), uppercase characters
for letters that must be present in the word (yellow) and an
uppercase character followed by "=" for characters that must be in
that location (green). For example, stA=rE indicates that 's', 't'
and 'r' are not in the word, 'a' is in the third position and 'e' is
in the word but not in the fifth position.

Supply additional patterns for the result of each guess.
"""

def main():
    parser = argparse.ArgumentParser(description=help)
    parser.add_argument("pattern", help="Five-character guess result, e.g. stA=rE", nargs='*')
    args = parser.parse_args()

    words = [_.strip() for _ in open('wordle.txt').readlines()]

    allinclude = set()
    for pattern in args.pattern:
        exclude = re.sub(r'[^a-z]', '', pattern)
        include = re.sub(r'[^A-Z]', '', pattern).lower()

        pattern = re.sub(r'[a-z]', '.', pattern).lower()
        pattern = re.sub(r'([a-z])(?!=)', r'[^\1]', pattern)
        pattern = re.sub(r'([a-z])=', r'\1', pattern)

        words = [w for w in words if not any(c in w for c in exclude)]
        words = [w for w in words if all(c in w for c in include)]
        words = [w for w in words if re.match(f"^{pattern}$", w)]

        allinclude.update(include)

    for wrds in grouped(words, 6):
        print(' '.join(wrds))
    print(f"{len(words)} possible matches")

    word = guess(words, allinclude)
    print(f"guess for next word: {word}")

def grouped(vals: list, size: int):
    """return groups of `size` items at a time from `vals` list"""
    for group in (vals[i:i + size] for i in range(0, len(vals), size)):
        yield group

def guess(words: list, known: set) -> str:
    """choose a good word for the next guess based on the count of remaining untried characters"""
    chars = set(''.join(words)).difference(known)
    counts = [(len(chars.intersection(set(w))), w) for w in words]
    counts.sort()
    guesses = [w for c, w in counts if c == counts[-1][0]]
    return random.choice(guesses)

if __name__ == '__main__':
    main()

The interesting part of the script takes place in the first for loop.

for pattern in args.pattern:
    exclude = re.sub(r'[^a-z]', '', pattern)
    include = re.sub(r'[^A-Z]', '', pattern).lower()

    pattern = re.sub(r'[a-z]', '.', pattern).lower()
    pattern = re.sub(r'([a-z])(?!=)', r'[^\1]', pattern)
    pattern = re.sub(r'([a-z])=', r'\1', pattern)

    words = [w for w in words if not any(c in w for c in exclude)]
    words = [w for w in words if all(c in w for c in include)]
    words = [w for w in words if re.match(f"^{pattern}$", w)]

Each guess pattern that is passed in is processed to generate three variables:

  • exclude contains the characters that must be absent from the word
  • include contains the characters that must be present in the word
  • pattern is a regex pattern produced from the guess that matches words with the two rules, one for correct letters in the wrong position and one for correct letters in the right position

To assign the exclude and include variables, a simple regular expression is used to remove the characters we aren’t interested in, leaving those we desire. The variable include will contain both the characters in the right position (green) and the wrong position (yellow).

Creating the variable pattern takes a little more explanation. We want it to match valid words but since we are handling characters to include and exclude separately, the regular expression can be much simpler. So in lines 30-32 we map the exclude (lowercase) characters to a . so they can match anything, and we map uppercase characters not followed by = to [^x] (where x is the matched character) to allow any character except that letter. Uppercase characters followed by = are preserved (without the =) to match only that character.

The regular expression pattern on this line may look a little confusing.

pattern = re.sub(r'([a-z])(?!=)', r'[^\1]', pattern)

The second parenthetical group (?!=) is a negative lookahead assertion which means it will only match if a lowercase letter is found that is not followed by an equal sign.

So after these three substitutions, a pattern like stA=rE is translated to the regular expression ..a.[^e] pattern.

Next, we filter the possible words in succession. First we remove words with any of the exclude characters in them, then we keep only words containing all of the include characters, and finally, we refine the list once more with words matching the character position constraints in the regex pattern, and by repeating this process for each guess pattern passed into the program, we get the final list of possible words.

The last few lines of the script print the candidate words out in six columns using the grouped iterator function which returns a fixed group of items on each iteration.

Suggesting a good guess

At the very end of the script, it will select one word out of the remaining words as a suggestion for the next attempt. This uses the guess function near the end of the script. The function takes as input the remaining words and a set of all letters that must be included in the words. The chars variable is assigned a set of all letters in all the remaining words after removing those that must be included. This will be the list of letters that have yet to be tried that are present in any of the remaining words.

The counts variable is the result of a list comprehension that contains tuples of the count of these untried characters paired with each word. We sort it to get the highest count at the end, and then we filter it by just those words that all have the highest count. We randomly choose one word from this list as a suggestion.

The idea here is to choose a word that has the highest number of untried letters which should be a good choice for reducing the number of possible candidates. (Another approach could have been to rank the words by how commonly used they are or by the frequency of use of the untried letters, but this was a good first cut.)

Sample Use

Here’s an example run of the script using the puzzle on Friday, January 27th, 2023. My first guess was “stare” and the response was

Running that results through the script returns 71 possible matches and suggests “fiord” which is not a good choice as it’s not one of the more common words in the list. Better choices are “chord”, “glory”, “ivory”, and “whirl”, but that’s the problem with picking the word randomly.

% python wordle.py staR=e
bidri blurb bourd bourg bourn burro
burry chirk chirm chiro chirp chirr
chord churl churm churn churr cirri
courb curry dowry durry fibry fiord
firry flory flurn flurr frory furry
glory gorry gourd gurry houri hurry
hydro indri inorb inurn ivory knurl
kodro kukri lorry lurry mbori micro
moorn moory mourn mucro myrrh ochro
piuri porry purry quirk quirl umiri
undry unorn unurn upcry updry vidry
whirl whorl worry yourn zorro
71 possible matches
guess for next word: fiord

So I ignored the advice from my own program and chose “chord” for my second guess, resulting in

Running the script again with both inputs reveals just 10 matches. Again, the suggested word “yourn” is a poor choice as it’s not a common word (I had to look it up, it means a regional or archaic form of “yours”, so that should come in handy). Instead, I picked “worry” which was the actual answer.

% python wordle.py staR=e chOR=d
bourg bourn burro gorry lorry mourn
porry worry yourn zorro
10 possible matches
guess for next word: yourn

Wrap Up

I don’t actually use this script when I play—I like the challenge of completing Wordle myself—but it’s an interesting problem and a nice programming exercise, and judging from the different approaches others have taken, there isn’t one obvious best way to solve it. I find those types of problems to be the most interesting ones to tackle.