MIP Formulation of Inverse Boggle

2021-08-18

Inverse Boggle

When writing a solver for Boggle the question struck me: how do you solve the inverse problem? Specifically, I am looking to answer the following question:

Given a set $S$ of strings over the Latin alphabet and a board width $N$, find an assignment of letters to an $N \times N$ Boggle board that includes the largest number of string from $S$ according to Boggle rules.

I did not prove this, but I am confident that the (decision version) of this problem is NP-hard based on the fact that the minimal superstring problem is NP-hard, and is equivalent to inverse Boggle in the case where the board is $N \times 1$ rather than $N \times N$. For this reason one should not expect to come up with a quick algorithm.

MIP Formulation

Fortunately, I enjoy not having a quick algorithm for a discrete optimization problem, because this justifies my playing with one of my favorite toys: MIP solvers. I’ve always found writing problem formulations for MIPs an exquisitely satisfying sort of puzzle. I even have a pet library to make this easier for problems on grids: PuLP-lparray. Without further ado, let’s solve inverse Boggle!

I’ll be using Python, with PuLP for this task.

Defining the Problem Space

First, let’s define our problem space:

def reverse_boggle(words: list[str], board_size: int = 4) -> NDArray[object]:
    """
    Generates a boggle board of a fixed size that contains as many words as possible
    from the given list.

    Uses the pulp_lparray library for easily manipulating arrays of LpVariables
    in numpy vectorized style.
    """

    # eliminate duplicates and standardize order
    words = sorted(set(words))

    # we'll me maximizing the number of words included
    prob = pulp.LpProblem(sense=pulp.LpMaximize)
    # creates an array of binary variables, one for each word in 'words'
    word_is_included = lparray.create(f"WordIncluded", (words,), cat=pulp.LpBinary)
    # sumit just sums the array and returns the LpVariable object
    prob.setObjective(word_is_included.sumit())

So far, so straightforward: for each unique word, I have a binary flag word_is_included such that word_is_included[i] == 1 iff the $i$‘th word is included in the board. I want to maximize the number of included words. If one wants, one can weigh included words by their length or score, or add fancier objective logic here. In our case, I’ll just be maximizing word count.

Next, let’s define the board:

    board = lparray.create_anon(
        #         N         , N         , 26
        "Board", (board_size, board_size, len(ascii_lowercase)), cat=pulp.LpBinary
    )
    (board.sum(axis=-1) == 1).constrain(prob, "LetterEncoding")

The board is an $N \times N \times 26$ binary array, such that board[i, j, k] == 1 $\Leftrightarrow$ “the (i, j)’th letter of the board is the $k$‘th letter in the alphabet”. The SOS1 constraint above stipulates that exactly one letter is assigned to each position in the board.

Encoding the Rules of Boggle

Now comes the significantly trickier part: encoding the Boggle rules that state that:

each word’s constituent letters are all on the board
the same position on the board cannot be used in a word more than once
each next letter is adjacent (including diagonally) to each previous one

First, I defined the variables and constraints that determine the assignment of letters to board positions:

    word_arrs = {
        word: lparray.create_anon(
            f"Word_'{word}'", (board_size, board_size, len(word)), cat=pulp.LpBinary
        )
        for word in words
    }

    for wx, (word, arr) in enumerate(word_arrs.items()):

        # 1.
        (arr.sum(axis=(0, 1)) == 1).constrain(prob, f"'{word}'_EachLetterAssigned")

        # 2.
        (arr.sum(axis=-1) <= 1).constrain(prob, f"Word_'{word}'_NoBackTracks")

        # 3.
        for lx, letter in enumerate(word):
        (
                (1 - word_is_included[wx: wx + 1])
                + board[:, :, ascii_lowercase.index(letter)]
                >= arr[:, :, lx]
        ).constrain(prob, f"'{word}'_Letter_{lx}_matches_board")

That is, for each word $w$ of length $L(w)$, I create a binary word_arr(w) array of shape $(N, N, L_w)$, such that word_arr(w)[i, j, k] == 1 $\Leftrightarrow$ “The $k$‘th character of word $w$ is located at $(i, j)$ on the board.” The three constraints in the code above ensure that:

each letter of each word has a fixed position on the board
for each word, any position $(i, j)$ is used at most once
for all included words $w$: word_arr(w)[i, j, k] == 1 $\implies$ board[i, j] contains letter w[k]

The code for the last condition is the trickiest to understand. For MIPs, inequalities between binary variables a >= b encode the logical statement $b \implies a$, and summation of binary variables a + b encodes the logical statement $a \vee b$. This can be seen if one compares the truth tables of those operations to the set of valid assignments of 1 and 0 to $a, b$ in the corresponding inequalities. In the case of addition any value greater than zero counts as true.

With this in mind, and mindful of how use of arrays vectorizes the operation across all $(i, j)$, constraint 3 can be read in boolean logic terms as:

$$ \forall i, j: \texttt{word_arr}(w)[i, j, k] \implies \neg \texttt{is_included}[w] \vee \texttt{board}[i, j, \texttt{alphabet.index}(w[k])]$$

which aligns with what is required.

The check for whether the word is included is critical: in the case where not all words can fit onto the board, I don’t want to have excluded words to have any effect on what letters are found on the board. The inclusion flag half of the disjunction provides an “escape hatch” to the optimizer to violate the constraint if it needs to, by marking the word as excluded. We use this same trick several times below.

Those new to integer programming should start to feel the power of what can be done with binary arrays. However, the trickiest is yet to come.

The Continuity Constraint

We need to express that the letters of each word are not merely present somewhere on the board, but form a chain. We will implement the following reasoning:

For each word_arr, if word_arr[i, j, c] == 1, then we know that the letter at $c + 1$ must be adjacent to $(i, j)$. This means it’s somewhere in the square defined by (ignoring edge conditions for simplicity) word_arr[i-1:i+1, j-1:j+1]. That is we want to encode the logical statement:

$$\texttt{word_arr}[i, j, c] \implies \bigvee_{|j’ - j| \le 1 \times |i’ - i| \le 1} \texttt{word_arr}[i’, j’, c + 1]$$

Recall that logical or is implemented by summation. Therefore, for each word_arr, we need the $i,j$-sum of each 3x3 rectangle at character position $c + 1$ to be greater than or equal to word_arr[i, j, c]. With edge conditions, where the summed area is less than 3x3 for letters on the edge or corners of the board, this is not a trivial constraint to express. However, it can still be implemented in a reasonably concise way using numpy vector vector operations.

        for ix in range(0, board_size):
            for jx in range(0, board_size):
                # this square sum is effectively a convolution
                conv = (
                    arr[
                        max(0, ix - 1) : min(board_size, ix + 2),
                        max(0, jx - 1) : min(board_size, jx + 2),
                        :,
                    ]
                ).sum(axis=(0, 1))

                (
                    arr[ix, jx, :-1] <= 1 - word_is_included[wx : wx + 1] + conv[1:]
                ).constrain(prob, f"'{word}'_ValidChain_({ix}, {jx})")

We see that for each word (in the outer loop, not shown here), for each board position, for each character position, the sum of adjacent squares for the next character must include the correct continuation letter. The line implementing the constraint is vectorized, and again uses a disjunction with the word inclusion flag to avoid enforcing this requirement for exluded words.

Note that nothing in the code immediately above prevents the next letter from illegally being at the same $(i, j)$ as the previous. However, this possibility is excluded by previous constraints. This is a subtle but important point: often our conception of the constraints of the problem can be decomposed into separate, much simpler to implement ones whose intersection covers the full requirements.

The Value of Redundant Constraints

The program above is enough for the solver to find valid solutions. However, it’s not the best we can do. That is becaue when formulating MIPs, redundant constraints can add a lot of value by pruning the search space of the solver and guiding it to feasible solutions faster. Redundant constraints can be thought of as adding more domain knowledge or “common sense” to the solver.

In our case, there is one very important constraint I have not encoded above – a floor on the total count of each letter. Though it’s not known ahead of time which letters will overlap across different words, each letter of each included word must appear at least once on the board. Furthermore, within each word duplicated letters must appear in at least as many positions on their board as their multiplicity. Expressed logically:

$$\forall c: | \{ (i, j) : \texttt{board}[i, j, c] = 1 \}| \ge \max_w \left( \texttt{count}(w, \texttt{alphabet}[c]) \right) $$

This can be coded quite directly:

    for wx, word in enumerate(words):
        letter_count = Counter(word)
        for char, value in letter_count.items():
            (
                board[:, :, ascii_lowercase.index(char)].sum()
                >= value * word_is_included[wx]
            ).constrain(prob, f"MinCharCount[{word},{char}]")

Again, the inclusion flag modulates the constraint. Where the inclusion flag is zero, the letter minimum is set to 0 as well, which excludes no values for a binary array.

The Value of a Warm Start

MIP solvers use branching algorithms like branch and cut that prune the search space by solving a relaxed version of each subproblem and eliminating any branches where the relaxation, which is guaranteed to have a solution at least as good as any integral feasible solution, performs worse than the best known feasible solution. This means that finding decent feasible solutions using whatever dirty, cheating heuristics you can think of is one of the best ways to speed up the search.

For the boggle problem, the simple hack I have coded below to “warm-start” the solver is to lay out the shortest words in the list back to back, in a snaking fashion, across the board. Where there are many short words, this will give the algorithm a quick lower bound on the value of the objective function.

    for word in sorted(words, key=len):
        for char in word:
            row = ix // board_size
            if row % 2:
                column = board_size - ix % board_size - 1
            else:
                column = ix % board_size
            board[row, column, ascii_lowercase.index(char)].setInitialValue(1)
            ix += 1
            if ix >= board_size * board_size:
                break
        else:
            word_is_included[words.index(word)].setInitialValue(1)
        if ix >= board_size * board_size:
            break

This is not the best one can do with heuristics, but the point is not to do the solver’s job for it, but just to start with something easy to guess. If you try to do better than this by hand, you will find it’s quite difficult!

Example Solution

The full code example can be found in this gist. In order to run it, you should install the requisite libraries and a MIP solver. I use COIN-OR’s CBC because while Gurobi or CPLEX are far better, I am unwilling to spend five figures annually on silly puzzles. Without further ado:

 print(
        reverse_boggle(
            [
                "erlang", "fortran", "golang", "haskell", "java", "javascript",
                "kotlin", "lisp", "php", "python", "scala", "ruby", "rust", "zig",
            ],
            board_size=4,
        )
    )

>>> [['g' 'l' 'n' 'k']
>>>  ['z' 'i' 't' 'o']
>>>  ['r' 's' 'y' 'h']
>>>  ['u' 'b' 'p' 'p']]

See if you can find all the programming languages in the solution. There are 7 from the original list.

Conclusion

This is probably far from the cleverest way of tackling this problem, and this formulation (at least using CBC) starts being unusably slow at around 10 words for a 4x4 grid. However, this stands as a fun and illuminating example of how widely-applicable MIPs are, and can be coded up quite quickly with some experience. I would love to see someone experiment with different formulations and a faster solver to see if larger word lists can be tackled. There is lots of room for exploration.

As a bonus challenge, try to come up with the best 5x5 board for the list in the example manually.