Most Important Interview Questions #4 - Word Ladder II and BFS

  16 mins read  

Why is it important?

Word ladder II is great for reviewing:

  1. Picturing problems as a graph
  2. Coding common graph algorithms (BFS and DFS)

It has one of the lowest leetcode acceptance rates (only 15% at the time of writing) and shouldn’t really be a hard problem (there aren’t really any ‘tricks’).

Since BFS, DFS, and visualizing problems as graphs come up so commonly during interviews (and not necessarily during your everyday coding) I think this is one of the most valuable problems you could do during interview prep.

The Word Ladder II Problem - Leetcode #126

Given two words (beginWord and endWord), and a dictionary’s word list, find all shortest transformation sequence(s) from beginWord to endWord, such that:

  1. Only one letter can be changed at a time
  2. Each transformed word must exist in the word list. Note that beginWord is not a transformed word.

Note:

  • Return an empty list if there is no such transformation sequence.
  • All words have the same length.
  • All words contain only lowercase alphabetic characters.
  • You may assume no duplicates in the word list.
  • You may assume beginWord and endWord are non-empty and are not the same.

Example 1:

Input:

beginWord = “hit”, endWord = “cog”, wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”]

Output: [ [“hit”,”hot”,”dot”,”dog”,”cog”], [“hit”,”hot”,”lot”,”log”,”cog”] ]

Example 2:

Input:

beginWord = “hit” endWord = “cog” wordList = [“hot”,”dot”,”dog”,”lot”,”log”]

Output: []

Explanation: The endWord “cog” is not in wordList, therefore no possible transformation.


Visualizing the Word Ladder Problem as a Graph

The key to our solution is to imagine the problem as a graph traversal (we also did this in the last question, n-queens).

Let’s imagine a graph where:

  • The vertices are all words in wordList
  • There is an edge between 2 vertices when they differ by a single character.

For the wordlist in example 1, the graph would look something like this:

Picture showing graph as described above.

The solution we are looking for is equivalent to finding all shortest paths from beginWord to endWord in our graph.

The most common algorithm for finding a shortest path is breadth first search (bfs), so let’s use it.

Finding the Length of the Shortest Path (Solving Word Ladder I)

To start with, I’ll just show how to find the length of the shortest word ladder (which is the related leetcode problem word ladder 1). To get the shortest word ladder, we’ll do a bfs in our graph.

In Python, (with a still to be written generateNeighbors function which allows us to iterate over all words in wordList that are within 1 character of ‘word’), this would look like:

# beginWord - string, endWord - string, wordList - list of string
def ladderLength(beginWord, endWord, wordList):
    # We use q to keep track of the next nodes to process in the BFS.
    # Each item in the queue is a list with two items:
    #   item[0] = word
    #   item[1] = steps to reach word + 1 (i.e. number of nodes in list of nodes 
    #             traversed to reach word - (format of Word Ladder I output)).
    q = collections.deque([ [beginWord,1] ])
    # We keep track of words we've processed to avoid getting stuck in a loop.
    seen = set([beginWord])
    # wordList is given as a list but we want O(1) lookup so we convert to a set.
    wordList = set(wordList)
    while q:
        q_item = q.popleft()
        for candidate in generateNeighbors(q_item[0], wordList):
            if candidate == endWord:
                return q_item[1] + 1
            elif candidate in seen:
                continue
            seen.add(candidate)
            q.append([candidate, q_item[1] + 1])
    return 0

Now, we just have to write our generateNeighbors(word, wordList) function. There are many ways to write it, let’s consider these:

  1. Compare ‘word’ to each member of ‘wordList’ and check if they differ by one character. - O(size of wordList)
  2. Generate all words within one character of ‘word’ and check if they are in wordList. - O(num characters * size of word X cost of lookup in wordList)
  3. Hash all wordLength-1 tuples- If two words differ by 1 character, they must have all other characters in common. For each word in wordList (for example, say a word, “fish”), for each character in the word, delete that character, and hash (in a multihash) remainingLetters -> originalWord. (For “fish” we hash ish -> fish, fsh -> fish, fih -> fish, fis -> fish). Once we have done this, for all words, we can quickly find all neighbors of a word by doing a similar procedure (e.g. for ‘fist’ we’ll get all values in ‘ist’, ‘fst’ ‘fit’, ‘fis’). - O(length of max word * length of word list) to build and O(length of max word) to query)

To compare these, let’s make some estimates. The largest wordList is probably going to be around the size of the dictionary (around 100000 words). The max size of a word is no more than something like 20 characters. Clearly 20 * 26 (there are 26 lowercase characters) < 200000, so using strategy 2 should be much faster than strategy 1.

Strategy 3 performs just as well as strategy 2 and even requires many less hash lookups when running the algorithm, but requires some extra storage to set up. If we were writing code to repeatedly solve the word ladder problem with the same wordList, it might be worth doing strategy 3, or explicitly computing all wordlists beforehand, but for now, we’ll stick with strategy 2.

Using strategy 2, we can define the generateNeighbors function to complete the problem as:

def generateNeighbors(word, wordList):
    for i in range(len(word)):
        for letter in string.ascii_lowercase:
            candidate = word[:i] + letter + word[i+1:]
            if candidate in wordList:
                yield candidate

Returning all Shortest Paths (modifying the solution to solve Word Ladder II)

The most natural thing to do to return all paths, rather than just the length of the shortest path, is to keep the entire path in the queue, rather than just the word. Unfortunately, we’ll have to do more than just store all paths to get this solution to pass on Leetcode. To show what it would look like:

def allShortestLadders(beginWord, endWord, wordList):
    # We use q to keep track of the next nodes to process in the BFS.
    # Each item in the queue is a list which represents a 'ladder' of words:
    q = collections.deque([ [beginWord] ])
    # We keep track of words we've processed to avoid getting stuck in a loop.
    seen = set([beginWord])
    wordList = set(wordList)
    output = []
    while q:
        # We can't return as soon as we see an answer like we did in the last problem
        # because we want to return ALL shortest paths. We'll process level by level and only
        # return when we're done a full level.
        # We also can't mark values as seen until we've processed a whole level (this
        # leads to the inefficiency in this version).
        num_in_level = len(q)
        seen_this_level = set()
        for i in range(num_in_level):
            q_item = q.popleft()
            for candidate in generateNeighbors(q_item[-1], wordList):
                if candidate == endWord:
                    output.append(q_item + [candidate])
                elif candidate in seen:
                    continue
                seen_this_level.add(candidate)
                q.append(q_item + [candidate])
        if output:
            return output
        seen |= seen_this_level
    return output

This solution is correct, but receives time limit exceeded when run on Leetcode. The main problem is that we’re now processing every path to a node instead of just every node. For example, suppose we were working on the problem with: beginWord = “hit”, endWord = “cog”, wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”, “lit”]

If we print our queue at the start of every ‘level’, it will look like:

LevelQueue Contents
1[‘hit’]
2[‘hit’, ‘lit’], [‘hit’, ‘hot’]
3[‘hit’, ‘lit’, ‘lot’], [‘hit’, ‘hot’, ‘dot’], [‘hit’, ‘hot’, ‘lot’]
4[‘hit’, ‘lit’, ‘lot’, ‘log’], [‘hit’, ‘hot’, ‘dot’, ‘dog’], [‘hit’, ‘hot’, ‘lot’, ‘log’]

If we look at level 3, we have two different paths to ‘lot’ ([hit, lit, lot] and [hit, hot, lot]). In our old solution these would’ve been collapsed into a single queue item (since we would’ve put ‘lot’ into ‘seen’ right away and never allowed it to get in the queue a second time). For large examples, this blow-up can lead to much more work (enough that we get time limit exceeded since we’re processing every path instead of every reachable node), how can we avoid it?

There are a few strategies that we can try, a couple are:

  1. Represent all paths to a given word in a single queue item.
  2. Use breadth first search to figure out the paths, but keep only parent pointers that indicate that during the bfs we got to node B from node A (effectively, build a ‘mini’ version of the graph which represents only the piece of the graph that has the shortest paths). Then use those pointers to reconstruct the paths (do a dfs to recreate the paths).

The simplest way to do option 1, is to define a small class to be used as our queue item. Instead of a word ladder to a word, we’ll store a list of all word ladders that reach a given word. The class might look like (there are probably better ways to make this class save space, since it essentially represents a directed acyclic graph (DAG) but this one is simple):

class WordLadder:
    def __init__(self, beginWord):
        self.ladder = [ [beginWord] ]

    def LastWord(self):
        # All lists in self.ladder have to have the same last word
        return self.ladder[0][-1]

    def Merge(self, other):
        self.ladder += other.ladder

    def AppendWord(self, word):
        for x in self.ladder:
            x.append(word)

    def toOutputFormat(self):
        return self.ladder

Then, our updated function is:

def allShortestLadders(beginWord, endWord, wordList):
    # Each item in the queue is a list of WordLadders (which is just a list of lists
    # that represent a path to get to the same word).
    q = collections.deque([ WordLadder(beginWord) ])
    # We keep track of words we've processed to avoid getting stuck in a loop.
    seen = set([beginWord])
    wordList = set(wordList)
    output = None
    while q:
        # We can't return as soon as we see an answer because we want to
        # return ALL shortest paths. We'll process level by level and only
        # return when we're done a full level.
        num_in_level = len(q)
        # We need to be able to merge two wordLadders so we keep word -> ladder
        seen_this_level = {}
        for i in range(num_in_level):
            q_item = q.popleft()
            for candidate in generateNeighbors(q_item.LastWord(), wordList):
                if candidate in seen:
                    continue
                newLadder = copy.deepcopy(q_item)
                newLadder.AppendWord(candidate)
                if candidate == endWord:
                    if output:
                        output.Merge(newLadder)
                    else:
                        output = newLadder
                elif candidate in seen_this_level:
                    seen_this_level[candidate].Merge(newLadder)
                else:
                    seen_this_level[candidate] = newLadder
                    q.append(newLadder)
        if output:
            break
        seen |= seen_this_level.keys()
    return output.toOutputFormat() if output else []

This passes the leetcode online judge, and is pretty simple to code, but still uses a lot of space. If we try option 2, we can get an even better solution:

# Dfs to recreate the old paths. This takes a dictionary of parent pointers.
# The entries A -> B mean you can get to A from B in a bfs starting
# from beginWord. The outptu is a list of all paths from beginWord to word.
# {string -> string}, string, string
def createAllPaths(parentDict, word, beginWord):
    if word == beginWord:
        return [[beginWord]]
    output = []
    for w in parentDict[word]:
       x = createAllPaths(parentDict, w, beginWord)
       for l in x:
           l.append(word)
           output.append(l)
    return output

# beginWord - string
# endWord - string
# wordList - [ string ]
def allShortestLadders(beginWord, endWord, wordList):
    # We use q to keep track of the next nodes to process in the BFS.
    q = collections.deque([ beginWord ])
    # We keep track of words we've processed to avoid getting stuck in a loop.
    seen = set([beginWord])
    # Convert to set for O(1) lookup
    wordList = set(wordList)
    # We'll store word -> set(words) in this dictionary. The set will have all
    # words which we reached 'word' from during the bfs. Then, we can treat this as
    # an adjacency list graph and dfs it to create the output.
    parents = collections.defaultdict(set)
    while q:
        # We can't return as soon as we see an answer because we want to
        # return ALL shortest paths. We'll process level by level and only
        # return when we're done a full level.
        num_in_level = len(q)
        finished = False
        seen_this_level = set()
        for i in range(num_in_level):
            q_item = q.popleft()
            for candidate in generateNeighbors(q_item, wordList):
                if candidate == endWord:
                    finished = True
                elif candidate in seen:
                    continue
                if candidate not in seen_this_level:
                    q.append(candidate)
                seen_this_level.add(candidate)
                parents[candidate].add(q_item)
        if finished:
            break
        seen |= seen_this_level
    return createAllPaths(parents, endWord, beginWord)

Note: If necessary, we can also make our search faster by starting a bfs from both the end node and start node and stopping when both encounter the same node. This has the same overall complexity, but should be faster in practice.


Conclusion

This problem could be asked as stated in the leetcode version as a one off query, or as more of a design problem where you are asked the same problem but you are given the wordList once and then must answer many queries on the graph.

The things you should get out of implementing this problem:

  • Practice writing bfs and dfs
  • Practice seeing problems as graphs

Remember, if you’re stuck, one thing you can try is imagining the problem as a graph.