Forum: Ruby Crossword Solver (#132)

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
James G. (Guest)
on 2007-08-02 19:08
(Received via mailing list)
Many of these solutions got pretty sluggish when run on large
crosswords.  The
reason is simple:  the search space for this problem is quite large.
You have
to try a significant number of words in each position before you will
find a
good fit for the board.

The good news is that this summer Google has paid to bring a very
search tool to Ruby.  I've been lucky enough to have a ring-side seat
for this
process and now I want to show you a little about this new tool.

You've probably seen Andreas L. solving a fair number of the recent
using his Gecode/R library.  Gecode/R is a wrapper over the C Gecode
library for
constraint programming.  Constraint programming is a technique for
problems in a way that a tool like Gecode can then use to search the
space and find answers for you.  Gecode is a very smart little searcher
and will heavily prune the search space based on your constraints.  This
to some mighty quick results, as you can see:

  $ time ruby solve_crossword.rb /path/to/scowl-6/final/english-words.20
                                 < /path/to/test_board.txt
  Reading the dictionary...
  Please enter the template (end with ^D)
  Building the model...
  Searching for a solution...
  A B I D E

  N   C   A

  G H O S T

  E   N   E

  L A S E R

  real    0m0.430s
  user    0m0.360s
  sys     0m0.068s

Let's have a look at Andreas's code to see how it sets things up for
Here's the start of the code:

  require 'enumerator'
  require 'rubygems'
  require 'gecoder'

  # The base we use when converting words to and from numbers.
  BASE = ('a'..'z').to_a.size
  # The offset of characters compared to digits in word-numbers.
  OFFSET = 'a'[0]
  # The range of integers that we allow converted words to be in. We are
  # only using the unsigned half, we could use both halves, but it would
  # complicate things without giving a larger allowed word length.
  ALLOWED_INT_RANGE = 0..Gecode::Raw::Limits::Int::INT_MAX
  # The maximum length of a word allowed.

  # ...

You can see that Andreas loads Enumerator and Gecode/R here as well as
up some constants.  The constants relate to how this code will model
words in
the database.  The plan here is to represent words as numbers made up of
base 26
digits which represent letters of the alphabet.  This will allow Andreas
to use
Gecode's integer variables to model the problem.  The downside is that
word size
will be limited by the maximum size of an integer and thus this solution
has a
weakness in that it can't be used to solve the larger puzzles.

You can see this conversion from numbers to words in the Dictionary

  # ...

  # Describes an immutable dictionary which represents all contained
  # as numbers of base BASE where each digit is the corresponding letter
  # itself converted to a number of base BASE.
  class Dictionary
    # Creates a dictionary from the contents of the specified dictionary
    # file which is assumed to contain one word per line and be sorted.
    def initialize(dictionary_location)
      @word_arrays = [] do |dict|
        previous_word = nil
        dict.each_line do |line|
          word = line.chomp.downcase
          # Only allow words that only contain the characters a-z and
          # short enough.
          next if previous_word == word or word.size > MAX_WORD_LENGTH
            word =~ /[^a-z]/
          (@word_arrays[word.length] ||= []) << self.class.s_to_i(word)
          previous_word = word

    # Gets an enumeration containing all numbers representing word of
    # specified length.
    def words_of_size(n)
      @word_arrays[n] || []

    # Converts a string to a number of base BASE (inverse of #i_to_s ).
    def self.s_to_i(string)
      string.downcase.unpack('C*').map{ |x| x - OFFSET }.to_number(BASE)

    # Converts a number of base BASE back to the corresponding string
    # (inverse of #s_to_i ).
    def self.i_to_s(int)
      res = []
      loop do
        digit = int % BASE
        res << digit
        int /= BASE
        break if
      end{ |x| x + OFFSET }.pack('C*')

  # ...

We've already talked about the number representation which is the
majority of
the code here.  Do have a look at initialize() and words_of_size()
though, to
see how words are being stored.  An Array is created where indices
word lengths and the values at those indices are nested Arrays of words
that length.  This makes getting a list of words that could work in a
given slot
of the puzzle easy and fast.

The s_to_i() method above relies on a helper method added to Array,
which is
simply this:

  class Array
    # Computes a number of the specified base using the array's elements
    # as digits.
    def to_number(base = 10)
      inject{ |result, variable| variable + result * base }

Again, this is just another piece of the conversion I explained earlier.

With a Dictionary created, it's time to model the problem in Gecode

  # Models the solution to a partially completed crossword.
  class Crossword < Gecode::Model
    # The template should take the format described in RubyQuiz #132 .
    # words used are selected from the specified dictionary.
    def initialize(template, dictionary)
      @dictionary = dictionary

      # Break down the template and create a corresponding square
      # We let each square be represented by integer variable with
      # -1...BASE where -1 signifies # and the rest signify letters.
      squares = template.split(/\n\s*\n/).map!{ |line| line.split(' ') }
      @letters = int_var_matrix(squares.size, squares.first.size,

      # Do an initial pass, filling in the prefilled squares.
      squares.each_with_index do |row, i|
        row.each_with_index do |letter, j|
          unless letter == '_'
            # Prefilled letter.
            @letters[i,j].must == self.class.s_to_i(letter)

      # Add the constraint that sequences longer than one letter must
      # words. @words will accumulate all word variables created.
      @words = []
      # Left to right pass.
      left_to_right_pass(squares, @letters)
      # Top to bottom pass.
      left_to_right_pass(squares.transpose, @letters.transpose)

      branch_on wrap_enum(@words), :variable => :largest_degree,
        :value => :min

    # Displays the solved crossword in the same format as shown in the
    # quiz examples.
    def to_s
      output = []
      @letters.values.each_slice(@letters.column_size) do |row|
        output <<{ |x| self.class.i_to_s(x) }.join(' ')
      output.join("\n\n").upcase.gsub('#', ' ')

    # ...

After storing the dictionary, this code breaks the crossword template
down into
an integer matrix created using the Gecode/R helper method
This will be our puzzle of words Gecode is expected to fill in.

The next two sections of the initialize() method build up the
These are the rules that must be satisfied when we have found a correct

The first of these chunks of code sets rules for any letters that were
given to
us in the template.  This code tells Gecode that the number in that
position of
the matrix must equal the value of the provided letter.  Take a good
look at
this RSpec like syntax because Andreas has spent a considerable effort
on making
it easy to express your constraints in a natural syntax and I hope you
agree with me that the end result is quite nice.

The other chunk of constraints are defined using a helper method we will
in just a moment.  The result of this code though is to ensure that the
selected represent letters that form actual words.

The final step in describing the problem to Gecode is to choose a
strategy.  This tells Gecode which variables it will need to make
guesses about
in order to find a solution as well as selecting a heuristic to use when
must be made.  In this case, words will be selected based on how much of
overall puzzle they affect, hopefully ruling out large sections of the
space quickly.

The problem model we just examined is pretty much always how constraint
programming goes.  You just need to remember the three steps:  create
variables for Gecode to fill in, define the rules for the values you
want Gecode
to find, and select a strategy for Gecode to use in solving the problem.

The to_s() method above just creates the output used in the quiz
examples for
display to the user.

Let's have a look at the helper methods used in the model definition

  # ...


    # Parses the template from left to right, line for line,
    # sequences of two or more subsequent squares to form a word in the
    # dictionary.
    def left_to_right_pass(template, variables)
      template.each_with_index do |row, i|
        letters = []
        row.each_with_index do |letter, j|
          if letter == '#'
            must_form_word(letters) if letters.size > 1
            letters = []
            letters << variables[i,j]
        must_form_word(letters) if letters.size > 1

    # Converts a word from integer form to string form, including the #.
    def self.i_to_s(int)
      if int == -1
        return '#'

    # Converts a word from string form to integer form, including the #.
    def self.s_to_i(string)
      if string == '#'
        return -1

    # Constrains the specified variables to form a word contained in the
    # dictionary.
    def must_form_word(letter_vars)
      raise 'The word is too long.' if letter_vars.size >
      # Create a variable for the word with the dictionary's words as
      # domain and add the constraint.
      word = int_var @dictionary.words_of_size(letter_vars.size)
      letter_vars.to_number(BASE).must == word
      @words << word

  # ...

The i_to_s() and s_to_i() methods here are mostly just wrappers over the
Dictionary counterparts we examined earlier.  The real interest is the
other two
methods that together define the word constraints.

First, left_to_right_pass() is used to walk the puzzle looking for runs
of two
or more letters that will need to become words in the Dictionary.  Each
time it
finds one, a hand-off is made to must_form_word(), which builds the

With the problem modeled, it takes just a touch more code to turn this
into a
full solution:

  # ...

  puts 'Reading the dictionary...'
  dictionary = || '/usr/share/dict/words')
  puts 'Please enter the template (end with ^D)'
  template = ''
  loop do
    line = $stdin.gets
    break if line.nil?
    template << line
  puts 'Building the model...'
  model =, dictionary)
  puts 'Searching for a solution...'
  puts((model.solve! || 'Failed').to_s)

You can see that this application code is just reading in the dictionary
template, then constructing a model and pulling a solution with the
method.  That's all it really takes to get answers after describing your
to Gecode.

If you want to continue exploring Andreas's Gecode/R wrapper, drop by
the Web
site which has links to many useful resources:

My thanks to all who put a good deal of effort into a hard search

Tomorrow we will take another peek at our dictionary, this time to see
numbers are hiding in there...
This topic is locked and can not be replied to.