Word Blender (#108)


#1

I’m almost embarrassed to admit that I originally turned this quiz down.
I
thought it would be too similar to the Scrabble Stems problem we did
long, long
ago. Ben politely explained that he felt it was different enough
though, and
then sent some guy named Guido after me in a parking lot one night.
That
convinced me to actually work the problem, and I had a change of heart.
I think
we can tell from the popularity of the problem that Ben is smarter than
I am, so
I’m glad I did.

Many solvers chose to build the entire game, so we will take a look at
that
approach. Though the quiz doesn’t explicitly call for it, most
programmers
chose to add scoring or a time limit to their solution to spice up the
action.
I went with the time limit and we will examine my own code below.

The first step is to get a word list, of course. There were several
approaches
to this process, since manipulating every word in the dictionary each
time could
get a little slow. My answer to this was just to cache the word list
after I
had built it once and reuse that for all future runs. Here’s the code
that
handles the loading and caching:

game date cache

CACHE_FILE = “.game_words”

if File.exist? CACHE_FILE # load from cache
word_list = File.open(CACHE_FILE) { |file| Marshal.load(file) }
else # build word list
# prepare data structure
words_by_signature = Hash.new { |words, sig| words[sig] = Array.new
}

# read dictionary
File.foreach(ARGV.shift || "/usr/share/dict/words") do |word|
  word.downcase!
  word.delete!("^a-z")

  next unless word.length.between? 3, 6

  (words_by_signature[word.split("").sort.join] << word).uniq!
end

# prepare recursive signature search
def choices( sig,
             seen = Hash.new { |all, cur| all[cur] = true; false },
             &blk )
  sig.length.times do |i|
    shorter = sig[0...i] + sig[(i+1)...sig.length]
    unless seen[shorter]
      blk[shorter]
      choices(shorter, seen, &blk) unless shorter.length == 3
    end
  end
end

# prepare game data structure
word_list = Hash.new

# build game choices
words_by_signature.keys.grep(/\A.{6}\Z/) do |possible|
  word_list[possible] = words_by_signature[possible]

  choices(possible) do |shorter_signature|
    if words_by_signature.include? shorter_signature
      word_list[possible].push(*words_by_signature[shorter_signature])
    end
  end
end

# cache for faster loads
File.open(CACHE_FILE, "w") { |file| Marshal.dump(word_list, file) }

end

This uses Marshal to build a trivial word cache with only a few lines of
code.
If the cache exists, we slurp it back in. Otherwise we build the cache
and save
it for future runs.

To build the cache, I pluck all three to six letter words out of the
indicated
dictionary file and build a word list containing all six letter words
linked to
smaller words using the same letters.

The main trick used in this recursive grouping of words is the use of
“signatures.” A word’s signature is just the sorted order of the
letters in the
word: aejms for james, for example. Comparing signatures makes it
trivial to
find words that use the same letters, since their signatures will be the
same.

Using the signatures, the choices() method just removes one character at
a time
recursing through the word list. This allows me to find all of the
smaller
words that can be formed using the same letters.

If you wanted to generate the entire list of games as the quiz
suggested, the
above is all you need. Each key is one possible round with the values
being the
words that can be matched in that round.

I wanted to play though and built a full game interface. My interface
requires
Unix, because those are the tricks I know. Here’s the start of that
code:

require “io/wait”

game interface (requires Unix)

TERMINAL_STATE = stty -g
system “stty raw -echo cbreak”
at_exit { system “stty #{TERMINAL_STATE}” }
clear = clear

a raw mode savvy puts

def out(args) print((args + ["\r\n"])) end

for easy selection

words = word_list.keys

This setup code memorizes the original state of the user’s terminal,
modifies
that state to raw mode so I can read individual characters as they are
pressed,
arranges to have the terminal settings restored at exit, grabs the
escape
sequence we can use to clear the terminal, and builds a puts() like
method that
works with raw mode. This code doesn’t really have much to do with
Ruby. I’m
just shelling out to standard Unix utilities here.

We’re now ready for the game event loop:

rounds = 0
loop do
# select letters
letters = current = words[rand(words.size)]
while word_list.include? letters
letters = letters.split("").sort_by { rand }.join
end
letters.gsub!(/.(?=.)/, '\0 ')

# round data
advance       = false
matches       = Array.new
current_match = String.new
start         = Time.now
message       = nil
last_update   = start - 1

# ...

I begin by selecting a word for the round and scrambling that word’s
letters
until they are no longer a real word. Then I setup some variables to
hold data
for the round like whether or not the user has found a six letter word
and
should advance as well as any feedback message I am currently showing
the user
and the last time I refreshed the screen.

Now we start the round event loop:

# ...

# round event loop
until Time.now >= start + 2 * 60
  # game display
  if last_update <= Time.now - 1
    print clear

    out "Your letters:  #{letters}"
    out "   Time left:  #{120 - (Time.now - start).round} seconds"
    out "  Your words:  #{matches.join(', ')}"
    out
    unless message.nil?
      out message
      out
    end
    print current_match
    $stdout.flush

    last_update = Time.now
  end

  # ...

The round event loop runs for two minutes and this first bit is
responsible for
drawing the screen. After clearing the screen, it prints the letters,
time
left, words found, and any feedback message to the screen. Note that I
update
the screen each second, assuming no other activity, so the user will
notice the
clock counting down.

Here’s the input processing:

  # ...

  # input handler
  if $stdin.ready?
    char = $stdin.getc
    case char
    when ?a..?z, ?A..?Z  # read input
      current_match << char.chr.downcase
      message       =  nil
      last_update   =  start - 1
    when ?\b, 127        # backspace/delete
      current_match = current_match[0..-2]
      message       =  nil
      last_update   =  start - 1
    when ?\r, ?\n        # test entered word
      if word_list[current].include? current_match
        matches << current_match
        matches = matches.sort_by { |word| [word.size, word] }
        if not advance and current_match.length == 6
          advance = true
          message = "You will advance to the next round!"
        else
          message = nil
        end
      else
        message = "Unknown word."
      end
      current_match = String.new
      last_update   = start - 1
    end
  end
end

# ...

This just checks to see if there is data waiting on STDIN. We don’t
want to
read from it without checking as that could block the application
waiting for
input. The ready?() method used here is added by the io/wait library
and will
return true if there is data waiting. The rest of the code just handles
the
input we get. Letters are recorded, we support backspace, and a
carriage-return
tells us to try the current word, set some feedback, and refresh the
display.

When the round is done, we finish out the game loop:

# ...

# round results
print clear
missed = word_list[current] - matches
unless missed.empty?
  out "Other words using \"#{letters}:\""
  out missed.sort_by { |word| [word.size, word] }.join(", ")
  out
end
if advance
  rounds += 1

  out "You made #{matches.size} word#{'s' if matches.size != 1}, ",
      "including at least one six letter word.  Nice work."
  out "Press any key to begin the next round."

  $stdin.getc
else
  out "You made #{matches.size} word#{'s' if matches.size != 1}, ",
      "but failed to find a six letter word."

  break  # end game
end

end

game results

out "You completed #{rounds} round#{‘s’ if rounds != 1}. ",
“Thanks for playing.”

The above code just prints missed words and results. If you found a six
letter
word, the code will loop to a new round. Otherwise it will break out of
the
program.

A big thank you to Ben (and Guido!) for convincing me to try the quiz
and to all
those that took the time to play with it.

Tomorrow we’ll try a problem that has been making the rounds,
literally…


#2

On 1/11/07, Ruby Q. removed_email_address@domain.invalid wrote:

The main trick used in this recursive grouping of words is the use of
“signatures.” A word’s signature is just the sorted order of the letters in the
word: aejms for james, for example. Comparing signatures makes it trivial to
find words that use the same letters, since their signatures will be the same.

Using the signatures, the choices() method just removes one character at a time
recursing through the word list. This allows me to find all of the smaller
words that can be formed using the same letters.

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

martin


#3

On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote:

Using the signatures, the choices() method just removes one
need for split, sort or string deletion).
Very interesting. I’ve never seen that before. I like it.

My version of “signatures” comes out of Programming Peals, if my
memory is right. Just FYI.

James Edward G. II


#4

On Thu, Jan 11, 2007, Martin DeMello wrote:

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

Holy crap. That is incredibly cool. Thanks for sharing this technique!

Ben


#5

On Thu, Jan 11, 2007, Ruby Q. wrote:

I’m almost embarrassed to admit that I originally turned this quiz down. I
thought it would be too similar to the Scrabble Stems problem we did long, long
ago. Ben politely explained that he felt it was different enough though, and
then sent some guy named Guido after me in a parking lot one night. That
convinced me to actually work the problem, and I had a change of heart. I think
we can tell from the popularity of the problem that Ben is smarter than I am, so
I’m glad I did.

Coupla things:

  1. Gianni, not Guido. I can understand the confusion, though, as it was
    dark and probably hurt a bunch.

  2. I didn’t manage to write a solution to the quiz, despite the fact
    that
    I had at least a month’s time more than most everyone else, so none
    of
    this smarter business. :slight_smile:

I wanted to play though and built a full game interface. My interface requires
Unix, because those are the tricks I know. Here’s the start of that code:

This setup code memorizes the original state of the user’s terminal, modifies
that state to raw mode so I can read individual characters as they are pressed,
arranges to have the terminal settings restored at exit, grabs the escape
sequence we can use to clear the terminal, and builds a puts() like method that
works with raw mode. This code doesn’t really have much to do with Ruby. I’m
just shelling out to standard Unix utilities here.

Further proof of #2 above.

Ben


#6

On 1/11/07, Martin DeMello removed_email_address@domain.invalid wrote:

Very interesting. I’ve never seen that before. I like it.

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

martin

I believe with Ruby they’ll get converted to Bignum automatically with
no
overflow, though I bet the running time suffers.


#7

On 1/11/07, James Edward G. II removed_email_address@domain.invalid wrote:

On Jan 11, 2007, at 7:54 AM, Martin DeMello wrote:

I used a different signature method - I mapped each unique letter in
the target word to a prime, and then used the product of the primes of
all the letters in a word as its signature. That way, a is contained
in b if signature(b) % signature(a) == 0, and you can generate the
signatures via each_byte, array lookup and integer multiplication (no
need for split, sort or string deletion).

Very interesting. I’ve never seen that before. I like it.

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

martin


#8

Martin DeMello wrote:

The numbers overflow 32 bits in the general case, I think, but for
this restricted problem it works very nicely.

Specifically (I was wondering) you can use 9 primes and still be under
32 bits, 15 primes and still be under 64 bits.

module Enumerable
def product; inject(){ |n,p| p*n }; end
end

first9 = primes[0…8]
puts “First 9 primes:\n%s\nproduct: %d (%d bits)” %
[
first9.inspect,
first9.product,
first9.product.to_s(2).length
]

first15 = primes[0…14]
puts “\nFirst 15 primes:\n%s\nproduct: %d (%d bits)” %
[
first15.inspect,
first15.product,
first15.product.to_s(2).length
]

#=> First 9 primes:
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23]
#=> product: 223092870 (28 bits)

#=> First 15 primes:
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
#=> product: 614889782588491410 (60 bits)

25 primes puts you at 121 bits; 26 primes at 128 bits on the nose.