Panagrams (#86)

First, let me clear up the naming issue, since I missed it when created
the
quiz. The actual term for sentences that contain all the letters of the
alphabet is pangram (or Swallowsgram) as discussed in the linked
article. The
quiz name is in error.

Now that we know what to call them, the question becomes how do we
generate self
documenting pangrams? The linked article described a technique called
“Robbinsoning,” which is a simple process. The idea is that you start
with some
random distribution of letter counts, build the sentence, adjust the
counts to
reflect the actual sentence counts, rebuild, adjust, etc. You can zero
in on a
solution in this fashion and most of the submitted solutions used
something
along these lines.

I want to have a look at Danial Martin’s code below, but before we get
into that
you need to know how Daniel’s code tracks letter frequencies. Here’s
Daniel’s
own description of the technique:

# I represented the letter frequencies of letters in a sentence
# as one huge bignum, such that if "freq" was a variable containing
# the number, then "freq & 0xFF" would be the number of "a"s in the
# sentence, "(freq>>8) & 0xFF" would be the number of "b"s, etc.
#
# This means that when I adjust a guess, changing the actual frequency
# is as simple as adding and subtracting from a single variable.

Now that we know what to expect, here’s the first bit of solution code
(reformatted slighty):

def find_sentence(prefix, suffix, initial = {})
  letterre     = Regexp.new('(?i:[a-z])');
  letters      = ('a'..'z').to_a
  letterpower  = Hash.new {|h,k| h[k] = 1 << ((k[0]-?a)*8)}
  lettershifts = letters.map {|x| ((x[0]-?a)*8)}
  basesentence = prefix + letters.map {|x|
    (x == 'z'? 'and ' : '') + "_ '#{x}'"
  }.join(', ') + suffix
  basefreq = 0
  basesentence.scan(letterre) {|x| basefreq += letterpower[x.downcase]}

  # ...

Obviously, we have a lot of setup work here. Let’s take it line by
line,
because there’s a lot going on. This method is invoked with a sentence
prefix
and suffix, and optionally the initial letter counts to try. When
invoked, the
first line defines a letter Regexp that will match individual letters in
upper
or lower case. The next line generates an Array of letters the code can
iterate
over.

The next two variables are helpers for working with the Bignum
frequencies.
letterpower will give you the number needed to add one count for the
keyed
letter to the frequencies and letter shifts is the offset a given letter
is
shifted into the Bigum. (Note that letterpower calculates its own
shift, in the
same way lettershifts does.)

The next three lines are easier to swallow. First, the sentence is
constructed
using the prefix, placeholders for counts, the word “and” as needed, and
the
sentence suffix. The next two lines then calculate the letter
frequencies of
this baseline using the Regexp to iterate over the letters and
letterpower to
adjust the count.

Let’s tackle the next chunk of code:

  # ...

  # enfreq holds the letter counts that spelling out that number adds 

to
# the sentence.
# E.g. enfreq[1] == letterpower[‘o’] + letterpower[‘n’] +
letterpower[‘e’]
enfreq = Hash.new {|h,k|
if k > 255 then
h[k] = h[k >> 8]
else
h[k] = 0
k.to_en.scan(letterre) {|x| h[k] += letterpower[x.downcase]}
h[k] += letterpower[‘s’] if k != 1
end
h[k]
}

  # ...

This Hash is a typical memoization idiom in Ruby. Give the Hash the
code to
calculate values from keys which it will invoke on the first call, then
all
future calls use a simple Hash lookup. The lookup is much faster of
course,
since it doesn’t need to rerun the code to build it.

In this case, the code builds letter counts, to add to the overall
frequency
counts, for the English word equivalents to the passed number. The else
statement is where that happens. The process is basically what we saw
for
counting the base sentence frequency before. Note that the code
accounts for
the s needed, should the count be plural.

The to_en() call in this code is provided by Glenn Parker’s solution to
Ruby
Quiz #25. I’ve discussed that code multiple times now, so I left it out
of this
summary for brevity.

Time for a little more code (minus a not needed variable removed by me):

  # ...

  guessfreq = 0
  letters.each{|x|
    guessfreq += (initial[x]||0) * letterpower[x]
  }
  guessfreq  = basefreq if guessfreq == 0
  actualfreq = 0

  # ...

Here we have the last bit of all this setup work. First, an initial
guessfreq
is built for the letters based on the passed Hash. If no starting point
was
given, the sentence uses the basefreq calculated earlier. Finally, a
variable
is allocated to hold the actualfreq of the generated sentence.

OK, grab a deep breath and let’s finally tackle the actual guessing loop
that
zeros in on solutions (debugging code and comments removed by me):

  # ...

  until guessfreq == actualfreq do
    if actualfreq > 0 then
      lettershifts.each{ |y|
        g = 0xFF & (guessfreq >> y)
        a = 0xFF & (actualfreq >> y)
        if (g != a)
          d = (g-a).abs
          r1 = rand(d+1)
          r2 = rand(d+1)
          r1=r2 if r1 < r2
          r1=-r1 if a<g
          if (r1 != 0) then
            guessfreq += r1 << y
            actualfreq += enfreq[g+r1] - enfreq[g]
          end
        end
      }
    else
      actualfreq = basefreq
      lettershifts.each {|y|
        g = 0xFF & (guessfreq >> y)
        actualfreq += enfreq[g]
      }
    end
  end

  # ...

This code cycles until our latest guess matches the actual count for the
sentence. On the first pass actualfreq will be zero, so the else clause
is
executed. This sets actualfreq to the baseline and adds in our guess.
After
that, each iteration should hit the if branch.

Each time through the if branch, every letter is compared for a distance
from
its guess value and its actual value. A random number is selected (well
two
actually with the high one favored) and added to our guess. Then the
actual is
updated to reflect the change. The net effect is that they close in on
each
other until our guess matches reality.

When they match, the final sentence can be constructed:

  prefix + ('a'..'z').map {|x|
    g = (guessfreq >> ((x[0]-?a)*8))%256
    (x == 'z'? 'and ' : '') + "#{g.to_en} '#{x}'" +
    (g==1 ? '' : 's')}.join(', ') + suffix
end

That works like the original sentence construction, but real numbers
instead of
actual placeholders this time. That’s returned as our final result.

Here’s the code that starts the process, passing the initial prefix and
suffix:

puts find_sentence(
  "Daniel M.'s sallowsgram program produced a sentence with ", "."
)

Interestingly, it seems that some inputs never resolve to a solution. I
have
not investigated this too deeply, but multiple quiz solvers reported the
same
issue.

My thanks to all the clever solvers who found all those pangrams so
quickly. I
learned a lot from reading the solutions, including how to use NArray
from Simon
Kroeger (worth a look).

Tomorrow’s quiz has us inventing time travel for Ruby, so the summary
for that
could show up at any moment now…

Ruby Q. [email protected] writes:

Now that we know what to call them, the question becomes how do we
generate self documenting pangrams? The linked article described a
technique called “Robbinsoning,” which is a simple process. The
idea is that you start with some random distribution of letter
counts, build the sentence, adjust the counts to reflect the actual
sentence counts, rebuild, adjust, etc. You can zero in on a
solution in this fashion and most of the submitted solutions used
something along these lines.

Note that the discussion of the different ways to solve this problem
pointed up two distinctly different “randomized Robbinsoning”
algorithms:

  1. rebuild the sentence (or recalculate the frequencies) after each
    letter adjustment. This was what my code did.

  2. Go through each letter doing randomized adjustments then, after all
    letters have been adjusted, rebuild/recalculate.

To have my program use this second algorithm, you can change the “if”
bit in the main loop to:

      actual2 = 0
      lettershifts.each{ |y|
        g = 0xFF & (guessfreq >> y)
        a = 0xFF & (actualfreq >> y)
        if (g != a)
          d = (g-a).abs
          r1 = rand(d+1)
          r2 = rand(d+1)
          r1=r2 if r1 < r2
          r1=-r1 if a<g
          if (r1 != 0) then
            guessfreq += r1 << y
                g += r1
          end
        end
            actual2 += enfreq[g]
      }
      actualfreq = actual2

But you really don’t want to do that. Adjusting all letters before
recalculating the frequencies turns this consistently-under-30-seconds
program into one that takes over an hour to complete. Note that this
second algorithm is the one followed by the perl script on the page
linked to in the quiz description.

Also note that Simon Kroeger’s (first posted) solution follows this
second algorithm but is still able to achieve phenomenally fast speed
because he is able to push all the operations in the lettershifts loop
above into NArray vector operations implemented in C. In his
alternate solution to this quiz, he uses the first algorithm but needs
to loop over all the letter positions in ruby. He reports that the
increased speed of the first algorithm and the comparative slowness of
doing the loop in ruby essentially cancel each other out, leading to
both solutions being roughly equivalent in terms of speed.

On Jul 13, 2006, at 10:15 AM, Daniel M. wrote:

To have my program use this second algorithm, you can change the “if”
r1=r2 if r1 < r2
But you really don’t want to do that. Adjusting all letters before
to loop over all the letter positions in ruby. He reports that the
increased speed of the first algorithm and the comparative slowness of
doing the loop in ruby essentially cancel each other out, leading to
both solutions being roughly equivalent in terms of speed.

Great info all around.

I didn’t clue in to the two different algorithms. Thanks for
bringing that up!

James Edward G. II