This problem turns out to be pretty tough. Finding the best pangram for
a given
constraint can be NP-Hard, as explained by Ken B.:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/219003
Given that, we’re just interested in code that gets reasonably close to
ideal
results in a decent amount of time. All of the submissions did that
quite well,
but I want to show off Ezwan Aizat Bin Abdullah F.'s code below. I’ve
Rubyified it just a bit, but the code still works the same. For
reference, it
finds this pangram in under a second on my box:
String: zcat jobs newgrp mv qhold tty uux mkfifo
Panagram? true
Words? 8
Length? 33
Dupes? 7
Let’s start with the easy stuff:
#!/usr/bin/ruby
# removed c99 fort77 m4
words = %w[ admin alias ar asa at awk basename batch bc bg cal cat cd
cflow chgrp chmod chown cksum cmp comm command compress cp
crontab csplit ctags cut cxref date dd delta df diff
dirname
du echo ed env ex expand expr false fc fg file find fold
fuser gencat get getconf getopts grep hash head iconv id
ipcrm ipcs jobs join kill lex link ln locale localedef
logger
logname lp ls mailx make man mesg mkdir mkfifo more mv
newgrp
nice nl nm nohup od paste patch pathchk pax pr printf prs
ps
pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig
qstat qsub read renice rm rmdel rmdir sact sccs sed sh
sleep
sort split strings strip stty tabs tail talk tee test time
touch tput tr true tsort tty type ulimit umask unalias
uname
uncompress unexpand unget uniq unlink uucp uudecode
uuencode
uustat uux val vi wait wc what who write xargs yacc zcat ]
words_line = words.join(" ")
# ...
This code begins by filling an Array with the Posix utilities, except
for the
three that contain digits in their names. Everyone dropped these to
keep things
simple. Note that the words are also joined to form one long line of
utility
names.
Next we need to enhance String a bit with some capabilities we will need
to do
our work:
# ...
class String
def letters(&block)
scan(/[a-z]/, &block)
end
def pangram?
letters.uniq.length == 26
end
def duplicated_letters
seen = Hash.new { |found, char| found[char] = 1; 0 }
letters.inject(0) { |sum, l| sum + seen[l] }
end
end
# ...
We will need easy access to an Array of letters (without the spaces), so
a
simple wrapper is created over scan(). From there we can check pangram
status
by ensuring that we have 26 unique letters. Finally, when printing
statistics
it would be nice to be able to see how many letters are duplicates, so
we add a
method for that too.
Now the code builds up some handy constants in preparation for the
search:
# ...
OCCURRENCE = Hash.new { |counts, char| counts[char] = 0 }
words_line.letters { |char| OCCURRENCE[char] += 1 }
WORDS_LENGTH = words_line.delete(" ").length
# ...
Here we see the OCCURRENCE Hash being filled. It will be the key to the
search
algorithm. The Hash is simply a count of how many times each letter is
present
in all of the names. “e” appears the most at a count of 62 times and
“z” is
present only once.
Another constant is filled with the overall character count.
Here’s a method used to print statistics when we have our answer:
# ...
def stats(line)
puts "String: #{line}"
puts "Pangram? #{line.pangram?}"
puts "Words? #{line.split.length}"
puts "Length? #{line.delete(' ').length}"
puts "Dupes? #{line.duplicated_letters}"
end
# ...
I think that one is pretty self documenting.
We’re finally ready for the primary search algorithm and here it comes:
# ...
=begin
Suitability should be determined by
* least number of letters takes out the length of the computer
* no used letters
* no duplicates
=end
def suitability(value, line)
amount, used = 0, ""
value.letters do |char|
amount += if used.include?(char) || line.include?(char)
-OCCURRENCE[char]
else
WORDS_LENGTH / OCCURRENCE[char]
end
used << char
end
amount
end
# ...
This method grades the passed value for inclusion in the line. Each
letter is
scored, creating an overall suitability rating for this value.
If the letter is already in the line or the letters we have seen so far
from
this value, the OCCURRENCE value for that letter is tacked onto the
score as a
penalty. This ensures the duplicates, especially of common letters, are
costly
so the code will try to avoid them.
If this is the first time the letter has been encountered, the score is
the
OCCURRENCE for that letter divided into the total letter count. This
makes it
valuable for the code to place letters like the “z” early on, since we
don’t
have any choices on that one and thus will also need to accept the
letters it
comes with.
Here’s the final piece of the puzzle, where we see suitability() put to
work:
# ...
line = ""
until line.pangram?
words.sort! do |x, y|
suitability(y, line) <=> suitability(x, line)
end
line += "#{words.shift} "
end
stats line
The process is easy to follow here. First an empty line is created.
Then we
loop until the line has become a pangram. At each step, the utility
list is
sorted by suitability() and the best word added to the line. Just
before exit,
the program dumps the statistics as the result.
My thanks to all you pangram hunters. As always you taught me clever
new tricks
in optimization.
Tomorrow we’ll play with the king of pathfinding algorithms…