Posix Pangrams (#97)

Ruby Q. wrote:

For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

Here’s my solution to the problem. I’m not quite sure why it doesn’t
find the better result already found by Ezwan as I expected it to be
optimal - please point out my error if you see it. It may well be that
making it optimal would also make it converge much more slowly though.

This uses an A* search algorithm to find the sequence with the least
number of total letters that forms a pangram (= least number of
repeated characters). The script is enormous but on my laptop (1.4GHz
Celeron M processor, 768MB memory) it manages to find the following
solution in under 0.2 seconds:

Words: qalter jobs find chgrp awk uux mv zcat tty
Length: 34
Cost: 34
Pangram? true
Unused Letters: (0)

I’ll break the code into parts to make it easier to read:

#########################################

Raw Data

#########################################

LETTERS = (?a…?z)
WORDS = %w{admin alias ar asa at awk basename batch bc bg c99 cal cat
cd cflow chgrp chmod chown cksum cmp comm command compress cp crontab
csplit ctags cut cxref date dd delta df diff dirname du echo ed env ex
expand expr FALSE fc fg file find fold fort77 fuser gencat get getconf
getopts grep hash head iconv id ipcrm ipcs jobs join kill lex link ln
locale localedef logger logname lp ls m4 mailx make man mesg mkdir
mkfifo more mv newgrp nice nl nm nohup od paste patch pathchk pax pr
printf prs ps pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig
qstat qsub read renice rm rmdel rmdir sact sccs sed sh sleep sort split
strings strip stty tabs tail talk tee test time touch tput tr TRUE
tsort tty type ulimit umask unalias uname uncompress unexpand unget
uniq unlink uucp uudecode uuencode uustat uux val vi wait wc what who
write xargs yacc zcat}

#########################################

Statistics

#########################################

This is all the letters used by a particular word

USED_LETTERS = Hash.new do |h, word|
letters = word.downcase.split(//)
letters.uniq!
h[word] = letters.map {|l| l[0]}.select {|l| LETTERS.include?(l)}
end

This is the length in letters of a word (auto-caching)

WORD_LENGTH = Hash.new do |h, word|
length = 0
word.downcase.each_byte do |byte|
length += 1 if LETTERS.include?(byte)
end
h[word] = length
end

This is the minimum number of letters that must be included to add

a particular letter into the sequence. It is the length (letters

only)

of the shortest word containing this letter.

LETTER_COSTS = {}
WORDS.each do |word|
letters = Hash.new{0}
word.each_byte {|byte| letters[byte] += 1}
letters.each do |letter, count|
if !LETTER_COSTS[letter] || LETTER_COSTS[letter] >
WORD_LENGTH[word]
LETTER_COSTS[letter] = WORD_LENGTH[word]
end
end
end

#########################################

Pangram Class - Search Tree States

#########################################

class Pangram

include Comparable

def initialize(words = [])
@words = words
end

def children
remaining = WORDS - @words
remaining.map do |word|
Pangram.new(@words + [word])
end
end

def unused_letters
unless @unused
@unused = LETTERS.to_a
@words.each do |word|
@unused -= USED_LETTERS[word]
end
end
@unused
end

def length
unless @length
@length = 0
@words.each do |word|
@length += WORD_LENGTH[word]
end
end
@length
end

def complete?
unused_letters.length == 0
end

Estimate of the cost to reach the end goal.

Must always over-estimate

def heuristic
unused_letters.inject(0) {|sum, letter| sum + LETTER_COSTS[letter]}
end

def cost
unless @cost
@cost = length + heuristic
end
@cost
end

def <=>(other)
self.cost <=> other.cost
end

def stats
%~
Words: #{@words.join(" “)}
Length: #{length}
Cost: #{self.length + self.heuristic}
Pangram? #{complete?}
Unused Letters: #{unused_letters.map{|l| l.chr}.join(”, ")}
(#{unused_letters.length})
~.strip
end
end

#########################################

Ordered Queue for A* Search

#########################################

class OrderedQueue
def initialize(array = [])
@queue = array.sort
end

Add items to the queue

This is optimised for large queues

def add(items)
sorted_items = items.sort
idx = 0
loop do
break if idx >= @queue.length - 1
qitem = @queue[idx]
while !sorted_items.empty? && sorted_items.first < qitem
@queue.insert(idx, sorted_items.shift)
idx += 1
end
idx += 1
end
sorted_items.each {|item| @queue << item}
end

def pop
@queue.shift
end

def empty?
@queue.length == 0
end
end

#########################################

Search implementation

#########################################

if $0 == FILE

Start queue with the empty initial pangram

options = OrderedQueue.new([Pangram.new])
loop do
# Check for exhaustion
if options.empty?
puts “No solution could be found”
exit 1
end

# Get the best current option
option = options.pop

# Check for completion
if option.complete?
  puts option.stats
  break
end

# Add children to options
options.add(option.children)

end
end

There’s also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

We haven’t talked a lot about this in the past, but I take a pretty
relaxed approach to tough quiz problems. Find a good heuristic and
get close. That’s fine with me.

I think it’s fine. Now if you’d said “find a solution and then prove
using the symbolic algebra of your choice that it’s optimal” would be
a different story.

Next Ruby Q.: Prove that P = NP, unless it doesn’t, in which case
prove that P != NP.

Martin

James Edward G. II [email protected] writes:

I’m all about thinking outside the box. I think we should all learn
to love the guesswork!

Guess how I found the shortest solution.

James Edward G. II

Hint: I didn’t implement an algorithm myself.