Word Search Generator (#159)

mansfiem · April 18, 2008, 11:39pm

Adam S. was our sole submission this week, so we are going to take
a look at his solution to the quiz.

Adam’s first bit of code is a simple class, CrossWord:

class CrossWord
attr_accessor :word, :place, :dir
def initialize w
@word=w
@dir=0
@place=nil
end
def unplaced?
@place==nil
end
end

This class maintains information about each word to be placed into the
puzzle, including the word itself but also the direction and
placement. The rest of the code works with this class rather than with
raw strings.

Let’s jump for a moment to the main code.

gridsize = ARGV[1].split(‘x’).map{|v|v.to_i}
g = CharGrid.new *gridsize

words=File.open(ARGV[0],“r”).read.split.sort_by{|s|s.length}.reverse
puts “unsolvable!” or exit if (words[0].size>gridsize.max)

g.fill words
File.open(“search.txt”,“w”){|f|f.puts g.to_s}
File.open(“solution.txt”,“w”){|f|f.puts g.solution}

Most of this should be fairly obvious work. The dimensions (e.g.
“24x20”) are retrieved from the second command-line argument, split
over the ‘x’ and converted to integers. These dimensions are then used
to construct a new CharGrid object (which will be examined shortly).

The search words are read from the provided file and sorted from
longest to shortest. Presumably, the intent here is to place the
longest words into the puzzle first; the shortest (and more easily
placed) words are placed last.

A quick sanity check is done by comparing the length of the longest
word against the longest dimension of the grid. If the word is longer,
a puzzle would be impossible to generate. Adam uses a neat, if perhaps
confusing, technique to alert the user and exit out. To understand how
it works, I had to parenthesize according to operator precedence:

((puts “unsolvable!”) or exit) if (words[0].size > gridsize.max)

Now, realizing that the puts statement returns nil, this made sense.
So, while this construct both outputs the warning message and exits
the program, this might have been clearer:

if (words[0].size > gridsize.max)
puts “unsolvable!”
exit
end

Getting back to Adam’s main code, the words — now sorted and checked —
are handed off to the grid’s fill method, which does the bulk of the
work. Once done, the final step is to output the two text files as
requested, the puzzle itself using the to_s method, and the solution
via the solution method.

Most of Adam’s code lives in the CharGrid class. The main algorithm is
found in the fill method:

def fill words
iterations = 0
@words = words.map{|w| CrossWord.new(w)}
words_todo = @words.select{|w|w.unplaced?}

The master list of all words is kept in @words, which, as mentioned
earlier, is converted and kept as CrossWord instances rather than raw
strings). Then Adam loops until his words_todo array is empty. For the
first iteration of the loop, this array contains all of the words
sorted from longest to shortest.

until words_todo.empty?
words_todo.each{|cw| place cw }

The first step is an attempt to place all of the words that have not
yet been placed into the puzzle. We’ll come back to the place method
in a bit.

words_todo = @words.select{|w|w.unplaced?}.sort_by{rand}

Adam reevaluates what words still remain to be placed. Unlike before,
where words were sorted from longest to shortest (in an attempt to
place the more complex words first), remaining words are now
randomized. I imagine this is an attempt to add a bit of chaos where
order (i.e. word length) failed, but I question how much of a benefit
this is, considering words are placed within the word grid mostly at
random.

if (iterations+=1) %(@w+@h)==0
#if we are getting stuck, try removing some words
puts “#{togo = words_todo.size} to go…”
words_done = (@words-words_todo).sort_by{rand}
(togo*2).times{|i| words_todo<< remove(words_done[i]) if
words_done[i]}
end
end
end

As per Adam’s comment, if the loop continues for a long while without
placing all the words in our list, some words are put back into the
todo list. togo indicates how many words are currently unplaced, and
twice as many are removed from the puzzle and put back into the
words_todo list. The hope here is that, when the next loop iteration
begins, the code will attempt to place the words remaining, in a
location and orientation that differs from prior iterations of the
loop.

Let’s look at the place method next:

def place cw
@words.sort_by{rand}.each{|otherword|
startpt = find_overlap(cw, otherword)
return if test_place(cw, startpt)
}
end

In an effort do provide an interesting puzzle, where words overlap
frequently, the word to be placed is checked against all of the other
words in random order. If the word is successfully placed, test_place
returns a non-false value and the function exits.

def find_overlap cw, testword
return nil if testword.unplaced?
if (offset = testword.word.index cw.word[0])
startpt = testword.place
offset.times{startpt=nextp(startpt,testword.dir)}
else
startpt = nil
end
startpt
end

find_overlap determines if the first character of the word to be
placed can be found in another, already placed word. If so, the
location of that character in the grid is determined by walking along
testword using nextp, which calculates the next grid index in a given
direction. That index is returned, or nil if the character is not
found.

This is certainly a good first step in trying to overlap words, though
I think more work needs to be done to create an interesting word
search puzzle… but I’ll come back to that at the end of this
summary.

def test_place cw, suggestion=nil
dir=randDir
start= suggestion || randStart(cw.word[0])

Calling test_place attempts to place the supplied word. A random
direction and starting point are chosen; the latter is random if no
suggestion was made by find_overlap (i.e. there were no characters in
common between the two compared words).

8.times do
pt = start
good = true

The loop will attempt all eight directions to place the word from the
starting location (but will exit the function as soon as the first
good direction is found).

  cw.word.each_byte{|chr|
    good = (@g[pt]==?. ||  @g[pt]==chr) && (pt=nextp(pt,dir))
    break unless good
  }

Looking at each character of the word to place, and the corresponding
grid spaces, we determine if each character either fills in an empty
grid space (currently occupied by periods ?.) or matches existing grid
characters. As long as all characters of the word fulfill this
criteria, the variable good remains true.

  return add(cw, start, dir) if good

After examining the whole word, a true value for good indicates that
the word fits into the grid and can be placed. The add method
accomplishes this, and we exit test_place early, now that we have
actually placed the word.

  dir=(dir+1)%8

end
nil
end

Finishing the loop, the next direction is tried if the previous
direction did not allow the word to fit. If no direction works, nil is
returned, and the place method moves onto the next word.

That covers the bulk of the algorithm; the rest of the class is
bookkeeping, filling in characters or removing them, checking
directions, and more. I’m going to move onto looking at the output,
however please take a look at Adam’s code if you want to see in more
detail how he manipulates the data.

Here now is the solution.txt output from one run of Adam’s generator,
using the word list provided earlier:

e-x-p-r-e-s-s-i-o-n v-i-r-t-u-a-l . . .

A congratulations to Adam whose solution does indeed generate word
search puzzles. His solution to the quiz worked better than my own
(incomplete) code, and managed to get it into a tighter space and
complete more often, due in large part to his algorithm that will
remove and replace words in the hopes that they will fit better.

However, one thing that is very similar between Adam’s solution and my
own is the output. As you’ll see, there are some very obvious
groupings of words parallel to one another. For the list of 44 words,
this particular word search has less than ten intersections, and the
parallel groupings make finding words easier than might be expected.
Handcrafted puzzles tend to be more difficult, visually interesting
(i.e. no parallel groupings) and many more intersections of words.

My initial thought was that Adam’s overlap code would provide more
interesting puzzles than my very rough generator, as mine had no
metrics or heuristics beyond “Does it fit?” But there was very little
difference between Adam’s output and my own. I have two suspicions.

First, Adam checks only the first character of one word against all of
the characters of another. I think that might have been two limiting,
and to my recollection, word search puzzles tend to have intersections
in the interior of words, and not at the ends.

Second, I suspect that a simple two-word intersection test like Adam’s
won’t be sufficient for interesting puzzles. I think three-word
(triangle), and perhaps four-word (rectangle), overlap tests might be
required. These are common patterns in word search puzzles, and the
triangle pattern would better fend off the uninteresting parallel
groupings.

mansfiem · April 19, 2008, 5:44am

quoth the Matthew M.:

…snip [SUMMARY] Word Search Generator (#159)…

That’s a fantastic writeup. Thanks Mathew…

-d

mansfiem · April 19, 2008, 8:05pm

On Fri, Apr 18, 2008 at 2:38 PM, Matthew M. [email protected]
wrote:

Thanks for the nice writeup.

Second, I suspect that a simple two-word intersection test like Adam’s
won’t be sufficient for interesting puzzles.

I intended to do a more thorough intersection test, but that complicated
the code for finding a valid start point, and I ran out of time…
But your suspicion is right, I had a chance to try the deeper test
today,
and it didn’t significantly improve the number of intersections or
reduce the parallelism.

I think three-word

(triangle), and perhaps four-word (rectangle), overlap tests might be
required. These are common patterns in word search puzzles, and the
triangle pattern would better fend off the uninteresting parallel
groupings.

Sounds like an interesting algorithm to research… someday.

-Adam

mansfiem · April 24, 2009, 7:27am

Sounds like an interesting algorithm to research… someday.

-Adam

Sorry but I couldn’t locate the source code of the solution. Can someone
point me to it?