# Posix Pangrams (#97)

The three rules of Ruby Q.:

1. Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least
once (a
famous example in English being “the quick brown fox jumps over the lazy
dog”).
For maximum style points a pangram should read smoothly, and have both
as few
repeated letters as possible (ideally zero), and as few words as
possible.

This quiz extends the idea to the posix utilities[1] - write a program
to find
pangrammatic collections of posix utilities that (1) use the fewest
utilities
and (2) have the minimum number of repeated letters. In either case,
break ties
on the other criterion; that is, your first solution should also have as
few
repeated letters as possible, and your second one should use as few
utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

On Fri, Oct 06, 2006 at 10:19:10PM +0900, Ruby Q. wrote:

on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

It’s really all about trying to make swearwords, isn’t it? Copper is

On the unix front, I nominate ‘mingetty’.

Martin

Logan C. [email protected] writes:

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

We use to play that because the lessons are so boring.

Longest English words made up from chemical symbols:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

Martin C. wrote:

It’s really all about trying to make swearwords, isn’t it? Copper is

On the unix front, I nominate ‘mingetty’.

My innocent eyes!

hurries to cover the pickle jar with small white round objects floating
in formaldehyde with a cloth

David V.

Christian N. wrote:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

I love that kind of wordplay. I appreciate your
sharing it.

Thanks,
Hydrogen Aluminum

Ruby Q. [email protected] writes:

possible.
Who has a better solution than these?

“awk chgrp df jobs lex mv tty uniq zcat”
“awk chgrp df join lex mv tty qsub zcat”

I think these are optimal for both (1) and (2).

Here is my solution. It’s not what I planned. I intended to submit an
algorithm that would find all the minimal Posix pangrams, and I
worked out a rather nice (IMO) recursive algorithm to do just that.
But Ruby doesn’t allocate enough stack space for that algorithm to
run to completion . So I’ve settled on an algorithm that finds
just one minimal pangram, but uses pseudo-random numbers to produce a
different pangram each time it is run.

My best pangram so far is

[
“unexpand”, “vi”, “write”, “zcat”
]

Regards, Morton

``` #! /usr/bin/env ruby -w # # Created by Morton G. on October 08, 2006. # # Ruby Q. 97 -- POSIX Pangrams # quiz_97.3.rb class Array def shuffle! n = size - 1 while n > 0 k = rand(n) self[k], self[n] = self[n], self[k] n -= 1 end self end end Removed c99, fort77, and m4 – the complication they add is IMHO unedifying. WORDS = %w[ admin alias ar asa at awk basename batch bc bg cal cat cd cflow chgrp chmod chown cksum cmp comm command compress cp crontab csplit ctags cut cxref date dd delta df diff dirname du echo ed env ex expand expr false fc fg file find fold fuser gencat get getconf getopts grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale localedef logger logname lp ls mailx make man mesg mkdir mkfifo more mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf prs ps pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings strip stty tabs tail talk tee test time touch tput tr true tsort tty type ulimit umask unalias uname uncompress unexpand unget uniq unlink uucp uudecode uuencode uustat uux val vi wait wc what who write xargs yacc zcat ] Return true if wds is a pangram. def pangram?(wds) wds.join.split(//).uniq.size == 26 end Return array giving pangram statistics: [, , ] def stats(pan) tmp = pan.join.split(//) [pan.size, tmp.size, tmp.size - tmp.uniq.size] end Given a pangram, return list of pangrams, where each panaram in the list is derived from the given one by removing one word. def diminish(pan) result = pan.collect do |item| rest = pan - [item] rest if pangram?(rest) end result.compact.shuffle! end Given a list of pangrams return a minimal pangram that can be derived from it. def find_minimal(pans) pan = pans.pop reduced = diminish(pan) return pan if reduced.empty? find_minimal(reduced) end Find a minimal pangram. pangram = find_minimal([WORDS]) p pangram # => [ “fg”, “jobs”, “qhold”, “stty”, “umask”, “unexpand”, “vi”, “write”, “zcat” ] p stats(pangram) # => [9, 39, 13] ```

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

uses one fewer utility

I woke up this morning with the realization that the code I posted
yesterday could be considerably simplified. Here is the simpler version.

Regards, Morton

``` #! /usr/bin/env ruby -w # # Created by Morton G. on October 09, 2006. # # Ruby Q. 97 -- POSIX Pangrams # quiz_97.4.rb Removed c99, fort77, and m4 – the complication they add is IMHO unedifying. WORDS = %w[ admin alias ar asa at awk basename batch bc bg cal cat cd cflow chgrp chmod chown cksum cmp comm command compress cp crontab csplit ctags cut cxref date dd delta df diff dirname du echo ed env ex expand expr false fc fg file find fold fuser gencat get getconf getopts grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale localedef logger logname lp ls mailx make man mesg mkdir mkfifo more mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf prs ps pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings strip stty tabs tail talk tee test time touch tput tr true tsort tty type ulimit umask unalias uname uncompress unexpand unget uniq unlink uucp uudecode uuencode uustat uux val vi wait wc what who write xargs yacc zcat ] Return true if wds is a pangram. def pangram?(wds) wds.join.split(//).uniq.size == 26 end Return array giving pangram statistics: [, , ] def stats(pan) tmp = pan.join.split(//) [pan.size, tmp.size, tmp.size - tmp.uniq.size] end Given a pangram, return a pangram derived from it by removing one word. def remove_one(pan) result = pan.collect do |item| diff = pan - [item] diff if pangram?(diff) end result.compact! result[rand(result.size)] unless result.empty? end Given a pangram return a minimal pangram derived from it. def find_minimal(pan) nxt = remove_one(pan) return pan unless nxt find_minimal(nxt) end Find a minimal pangram. pangram = find_minimal(WORDS) p pangram # => [ “expr”, “getconf”, “jobs”, “mv”, “qdel”, “type”, “unlink”, “what”, “zcat” ] p stats(pangram) # => [9, 39, 13] ```

On 10/9/06, [email protected] [email protected] wrote:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

Isn’t it supposed to use every letter?

Martin

On Oct 9, 2006, at 11:15 AM, Martin C. wrote:

On 10/9/06, [email protected] [email protected] wrote:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

Isn’t it supposed to use every letter?

Which letters are missing?

\$ ruby -e ‘puts((“a”…“z”).to_a - “zcat stty jobs cxref newgrp iconv
cksum qhold”.split(""))’
\$

James Edward G. II

Which letters are missing?

\$ ruby -e ‘puts((“a”…“z”).to_a - “zcat stty jobs cxref newgrp iconv
cksum qhold”.split(""))’
\$

Aha! So it does.

I’m just jealous that yours is smaller than mine.

Martin

I’m under the impression that it does? What letter am I missing?

-Cameron

[email protected] writes:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

uses one fewer utility

That’s good to know, thank you.

On Fri, 06 Oct 2006 22:19:10 +0900, Ruby Q. wrote:

For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

Finding the pangram with the minimum number of words is NP-Hard. The
reduction is from Minimum Set Cover[1]:

MINIMUM SET COVER:
INSTANCE: Collection C of subsets of a finite set S.
SOLUTION: A set cover for S, i.e., a subset C’ of C such that
every element in S belongs to at least one member of C’.
MEASURE: Cardinality of the set cover, i.e., size(C’)

Reduction:
Each element in S becomes mapped to a unique letter in the alphabet used
by pangrams. (Note that in general, we can play Pangrams in any
langauage, the alphabet can be of length we want) Each subset K (which
is
an element of C) becomes a word, made up of the letters corresponding to
the elements of K. Finding a pangram with the minimum number of words in
this setup would give a solution to the set-cover instance.

I suspect that the other optimization problem here may also be NP-Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post) how to
do
a simple reduction to that problem though.

–Ken B.

After a few optimization passes, I’ve finally figured out how to get
the optimal POSIX pangrams in about 30 minutes on a fast, modern
machine. I am still relatively new to Ruby so this application is
surely not as concise as it could be. To run it, save the two files,
create a posix-utilities.txt with the utilities, one per line, and run
the unit test.

– pangrams.rb

# A set of routines to find minimal pangrams from an input set of

words. A pangram is a set of

# words that contain all the letters of the alphabet. Unless there are

some properties of

# pangrams that I am unaware of, finding the minimal pangrammatic

subset of a set of words

# is in a family of constraint-satisfaction problems that is

NP-complete, as in, there is

# not a way to find the minimal subset that is guaranteed to run in

polynomial time with respect

# The most common ways of finding excellent, but not provably perfect

solutions are to either

# use a non-deterministic algorithm, or to try to prune the number of

subsets that you search

# as much as possible, and backtrack as often as possible, quitting

after a pre-determined

# The Pangrams class below implements both of these algorithms,

providing methods to yield a

# specified number of random pangrams, or to do a backtracking search

to yield successively

# In order not to waste time trying non-pangrams, the strategy for

building pangrams is to

# build a list by finding the least common letter not already there,

and then searching for pangrams

# using the utilities containing the least-common missing letter. If at

any point the string

# is longer or has more repeated characters than the known minimums

then we stop searching and

# backtrack. So we abort early and try to limit the number of decisions

to try at each step.

# The search algorithm does not run in polynomial time, but for 160

elements, it can search the

# problem space in about 30 minutes on a fast, modern machine

(dual-core xeon). For larger datasets

cksum qhold"

(4 repeats)

order)

# Takes a list of words and creates a histogram of letters. The

histogram can then

# be queried for pangramness and repeated letter counts.

class LetterHistogram

# Initializes with a set of words

def initialize(words=nil)

``````@hist = Hash.new(0)

@total_letters = 0

words.each {|word| add word} unless words.nil?
``````

end

# Adds a word to the pangram

``````word.scan(/./) {|l| @hist[l] = @hist[l] + 1 if ('a'..'z') === l}

@total_letters += word.size
``````

end

# Returns true if this list of words has a pangram

def pangram?

``````return @hist.size == 26
``````

end

# Returns the number of repeated letter

def repeats

``````@total_letters - @hist.size
``````

end

# Returns a list of missing letters. Used in conjunction with the

WordLetterMap below to

# find a good word to add to the list to make a pangram.

def missing_letters

``````missing = Array.new

('a'..'z').each {|l| missing << l if @hist[l] == 0}

missing
``````

end

end

# Contans a map of Letters => Words containing that letter

class WordLetterMap

def initialize(words)

``````@map = Hash.new

``````

end

``````word.scan(/./) do |l|

if ('a'..'z') === l

@map[l] = Array.new unless @map.has_key? l

@map[l] = @map[l] << word

end

end
``````

end

# Used to limit the number of choices as we search the minimal

pangram space

def least_common(words=nil, histogram=nil)

``````histogram = LetterHistogram.new words if histogram.nil?

min_words = nil

histogram.missing_letters.each do |l|

new_words = @map[l] - words

if min_words.nil? || min_words.size > words.size

min_words = new_words

end

end

return min_words
``````

end

end

# Holds a list of words and generates minimal pangrams, both

non-deterministically,

class Pangrams

# Exception to throw when we have returned the maximum number of

pangrams

class AllDone < Exception

end

# Initialize with a word list

def initialize(words=nil)

``````if words.nil?

@words = Array.new

else

@words = words

end
``````

end

# Adds a word to the set of words to produce pangrams

``````@words[@words.size] = word
``````

end

# The number of words in the set of words

def size

``````@words.size
``````

end

# Loads a list of words from a file

def self.from_file(filename)

``````p = Pangrams.new

return p
``````

end

# Yields randomly generated minimal pangrams to the passed block

def random(count,&block)

``````@word_letters = WordLetterMap.new @words

(0..count).each {|i| random_pangram([],&block)}
``````

end

# Searches for good minimal pangrams, yielding the ones it finds to

the block

def search(max_count=0,&block)

``````@min_size = size

@min_repeats = 1000

@max_count = max_count

@count = 0

@word_letters = WordLetterMap.new @words

begin

pangram_search([],&block)

rescue AllDone

# Quit gracefully

end
``````

end

private

# searches the pangram space by finding all words containing the

least common letter until

# a pangram has been built, passing each found pangram to the block.

The algorithm tries to

# be smart searching the word space by choosing the least common

letters first, making the

# overall space smaller. It also tries make smart use of backtracking

to avoid searching

# non-lucrative branches.

def pangram_search(words, &block)

``````# Bust out if we've found enough pangrams

raise AllDone.new if @max_count != 0 && @count > @max_count

h = LetterHistogram.new words

# If we already have more words or more repeats, then no need to
``````

look any

``````# further, we should backtrack and try something else.

return if words.size >= @min_size && h.repeats >= @min_repeats

# This pangram is somehow minimal, so pass to the block

if h.pangram?

@min_size = words.size if words.size < @min_size

@min_repeats = h.repeats if h.repeats < @min_repeats

@count += 1

yield words,h

return

end

# No pangram yet, find children and descend

new_words = @word_letters.least_common words,h

new_words.each {|w| pangram_search words + [w], &block}
``````

end

# Builds a pangram by finding the least common letter that is not

represented in the

# passed-in array, and recursing until a pangram is built. This

algorithm should build

# minimal pangrams almost all of the time (i.e. removing a word from

the set will make

# the word list non-pangrammatic and will yield very good pangrams,

but probably not the

# most optimal ones.

def random_pangram(words, &block)

``````# Do we already have a pangram? If so, pass to block and quit

h = LetterHistogram.new words

if h.pangram?

yield words,h

return

end

# Be non-deterministic, and descend on a random word

new_words = @word_letters.least_common words,h

new_word = new_words[rand(new_words.size)]

random_pangram words + [new_word], &block
``````

end

end

– test_pangrams.rb

require ‘test/unit’

require ‘pangrams’

# Unit test harness to test the histogram and the frequency map code,

and also to exercise the pangram

# generation code. The tests assume a list of posix words, one per

line, in ‘posix-words.txt’

class TestPangrams < Test::Unit::TestCase

def test_histogram

``````l = LetterHistogram.new ['moon']

assert_equal false, l.pangram?

assert_equal 1, l.repeats

pangram = %w{the quick brown fox jumps over the lazy dog}

l = LetterHistogram.new pangram

assert_equal true, l.pangram?

l = LetterHistogram.new ['qwertyuiopasdfghjklzxcvbn'] # m is
``````

missing

``````m = l.missing_letters

assert_equal 1, m.size

assert_equal 'm', m.first
``````

end

def test_random_search

``````puts "-- Testing Random Pangram Generation --"

p = Pangrams.from_file 'posix-words.txt'

min_repeats_length = p.size

min_repeats = 1000

min_repeats_pangram = nil

min_size = 160

min_size_repeats = 1000

min_size_pangram = nil

count = 0

p.random(1000) do |p, hist|

repeats = hist.repeats

size = p.size

if repeats < min_repeats || repeats == min_repeats && size <
``````

min_repeats_length

``````    min_repeats_pangram = p

min_repeats = repeats

min_repeats_length = p.size

puts "New min-repeats pangram: #{min_repeats_pangram.join ' '}
``````

with #{min_repeats} repeats"

``````    \$stdout.flush

end

if size < min_size || size == min_size && repeats <
``````

min_size_repeats

``````    min_size = size

min_size_repeats = repeats

min_size_pangram = p

puts "New min-size pangram: #{min_size_pangram.join ' '} with
``````

#{min_size} words"

``````    \$stdout.flush

end

end
``````

end

def test_backtracking_search

``````puts "--Testing backtracking search--"

p = Pangrams.from_file 'posix-words.txt'

min_repeats_length = p.size

min_repeats = 1000

min_repeats_pangram = nil

min_size = 160

min_size_repeats = 1000

min_size_pangram = nil

p.search(50) do |p, hist|

repeats = hist.repeats

size = p.size

if repeats < min_repeats || repeats == min_repeats && size <
``````

min_repeats_length

``````    min_repeats_pangram = p

min_repeats = repeats

min_repeats_length = p.size

puts "New min-repeats pangram: #{min_repeats_pangram.join ' '}
``````

with #{min_repeats} repeats"

``````    \$stdout.flush

end

if size < min_size || size == min_size && repeats <
``````

min_size_repeats

``````    min_size = size

min_size_repeats = repeats

min_size_pangram = p

puts "New min-size pangram: #{min_size_pangram.join ' '} with
``````

#{min_size} words"

``````    \$stdout.flush

end

end
``````

end

end

On Thu, 12 Oct 2006 17:06:26 +0900, Martin C. wrote:

I suspect that the other optimization problem here may also be NP-
Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

There’s also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

And if you do naive searches of the entire solution space, the
number of POSIX utilities involved here is not all that trivial.

I suspect that the other optimization problem here may also be NP-
Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

On Oct 12, 2006, at 10:25 AM, Ken B. wrote:

Phew.

Martin

There’s also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

relaxed approach to tough quiz problems. Find a good heuristic and
get close. That’s fine with me.

To me, Ruby Q. is all about keeping the brain sharp. I think
programmers are often susceptible to the “That’s not possible!”
mentality and we have to fight against that. MJD once gave a neat
speech about how when Alchemy was as old as our profession is now,
they were still actively trying to turn lead into gold but we seem
content to bury ourselves in rules about what we can’t do.

cases? No. Does that mean we shouldn’t do it? No way. The
solution I talked up in the summary finds a darn good pangram in
under a second! It’s within 7 characters of perfect. It maybe
guesswork, but it’s damn good guesswork.

I have a great respect for those who can think outside the box like
that. I think we should cultivate that skill in ourselves.

Even at work, I’m the company’s resident language geek, so I often
see a problem and launch into a detailed plan of how we can solve it
to death. Luckily, my two far more practical coworkers usually just
politely point out the simple fix that is plenty good enough for our
purposes. If it wasn’t for them I would never accomplish a complete
anything.

I’m all about thinking outside the box. I think we should all learn
to love the guesswork!

There’s my two cents on this issue.

James Edward G. II

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.