Posix Pangrams (#97)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

A pangram is a sentence containing every letter of the alphabet at least
once (a
famous example in English being “the quick brown fox jumps over the lazy
dog”).
For maximum style points a pangram should read smoothly, and have both
as few
repeated letters as possible (ideally zero), and as few words as
possible.

This quiz extends the idea to the posix utilities[1] - write a program
to find
pangrammatic collections of posix utilities that (1) use the fewest
utilities
and (2) have the minimum number of repeated letters. In either case,
break ties
on the other criterion; that is, your first solution should also have as
few
repeated letters as possible, and your second one should use as few
utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

On Fri, Oct 06, 2006 at 10:19:10PM +0900, Ruby Q. wrote:

on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

It’s really all about trying to make swearwords, isn’t it? Copper is
your friend.

On the unix front, I nominate ‘mingetty’.

Martin

Logan C. [email protected] writes:

Interesting. I had a high school chemistry teacher who had us play
similar games with symbols of the periodic table.

We use to play that because the lessons are so boring. :slight_smile:

Longest English words made up from chemical symbols:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

Martin C. wrote:

It’s really all about trying to make swearwords, isn’t it? Copper is
your friend.

On the unix front, I nominate ‘mingetty’.

My innocent eyes!

hurries to cover the pickle jar with small white round objects floating
in formaldehyde with a cloth

David V.

Christian N. wrote:

THErMoPHOsPHOrEsCEnCe
NONRePReSEnTaTIONAlISm

Longest German words made up from chemical symbols:

INFLaTiONSINdIKAtOReN
AuSLaNdSINVEsTiTIONeN
KNOTeNReCHNErAPPLiKAtION

I love that kind of wordplay. I appreciate your
sharing it.

Thanks,
Hydrogen Aluminum

Ruby Q. [email protected] writes:

possible.
Who has a better solution than these?

“awk chgrp df jobs lex mv tty uniq zcat”
“awk chgrp df join lex mv tty qsub zcat”

I think these are optimal for both (1) and (2).

Here is my solution. It’s not what I planned. I intended to submit an
algorithm that would find all the minimal Posix pangrams, and I
worked out a rather nice (IMO) recursive algorithm to do just that.
But Ruby doesn’t allocate enough stack space for that algorithm to
run to completion . So I’ve settled on an algorithm that finds
just one minimal pangram, but uses pseudo-random numbers to produce a
different pangram each time it is run.

My best pangram so far is

[
“fg”, “jobs”, “qhold”, “stty”, “umask”,
“unexpand”, “vi”, “write”, “zcat”
]

Regards, Morton

#! /usr/bin/env ruby -w # # Created by Morton G. on October 08, 2006. # # Ruby Q. 97 -- POSIX Pangrams # quiz_97.3.rb

class Array

def shuffle!
   n = size - 1
   while n > 0
      k = rand(n)
      self[k], self[n] = self[n], self[k]
      n -= 1
   end
   self
end

end

Removed c99, fort77, and m4 – the complication they add is IMHO

unedifying.

WORDS = %w[
admin alias ar asa at awk basename batch bc bg cal cat cd cflow
chgrp chmod chown cksum cmp comm command compress cp crontab csplit
ctags cut cxref date dd delta df diff dirname du echo ed env ex
expand
expr false fc fg file find fold fuser gencat get getconf getopts
grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale
localedef logger logname lp ls mailx make man mesg mkdir mkfifo more
mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf
prs ps
pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub
read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings
strip stty tabs tail talk tee test time touch tput tr true tsort tty
type ulimit umask unalias uname uncompress unexpand unget uniq
unlink
uucp uudecode uuencode uustat uux val vi wait wc what who write
xargs
yacc zcat
]

Return true if wds is a pangram.

def pangram?(wds)
wds.join.split(//).uniq.size == 26
end

Return array giving pangram statistics:

[, , ]

def stats(pan)
tmp = pan.join.split(//)
[pan.size, tmp.size, tmp.size - tmp.uniq.size]
end

Given a pangram, return list of pangrams, where each panaram in the

list

is derived from the given one by removing one word.

def diminish(pan)
result = pan.collect do |item|
rest = pan - [item]
rest if pangram?(rest)
end
result.compact.shuffle!
end

Given a list of pangrams return a minimal pangram that can be derived

from it.

def find_minimal(pans)
pan = pans.pop
reduced = diminish(pan)
return pan if reduced.empty?
find_minimal(reduced)
end

Find a minimal pangram.

pangram = find_minimal([WORDS])
p pangram # =>

[

“fg”, “jobs”, “qhold”, “stty”, “umask”,

“unexpand”, “vi”, “write”, “zcat”

]

p stats(pangram) # => [9, 39, 13]

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

uses one fewer utility

I woke up this morning with the realization that the code I posted
yesterday could be considerably simplified. Here is the simpler version.

Regards, Morton

#! /usr/bin/env ruby -w # # Created by Morton G. on October 09, 2006. # # Ruby Q. 97 -- POSIX Pangrams # quiz_97.4.rb

Removed c99, fort77, and m4 – the complication they add is IMHO

unedifying.

WORDS = %w[
admin alias ar asa at awk basename batch bc bg cal cat cd cflow
chgrp chmod chown cksum cmp comm command compress cp crontab csplit
ctags cut cxref date dd delta df diff dirname du echo ed env ex
expand
expr false fc fg file find fold fuser gencat get getconf getopts
grep hash head iconv id ipcrm ipcs jobs join kill lex link ln locale
localedef logger logname lp ls mailx make man mesg mkdir mkfifo more
mv newgrp nice nl nm nohup od paste patch pathchk pax pr printf
prs ps
pwd qalter qdel qhold qmove qmsg qrerun qrls qselect qsig qstat qsub
read renice rm rmdel rmdir sact sccs sed sh sleep sort split strings
strip stty tabs tail talk tee test time touch tput tr true tsort tty
type ulimit umask unalias uname uncompress unexpand unget uniq
unlink
uucp uudecode uuencode uustat uux val vi wait wc what who write
xargs
yacc zcat
]

Return true if wds is a pangram.

def pangram?(wds)
wds.join.split(//).uniq.size == 26
end

Return array giving pangram statistics:

[, , ]

def stats(pan)
tmp = pan.join.split(//)
[pan.size, tmp.size, tmp.size - tmp.uniq.size]
end

Given a pangram, return a pangram derived from it by removing one

word.
def remove_one(pan)
result = pan.collect do |item|
diff = pan - [item]
diff if pangram?(diff)
end
result.compact!
result[rand(result.size)] unless result.empty?
end

Given a pangram return a minimal pangram derived from it.

def find_minimal(pan)
nxt = remove_one(pan)
return pan unless nxt
find_minimal(nxt)
end

Find a minimal pangram.

pangram = find_minimal(WORDS)
p pangram # =>

[

“expr”, “getconf”, “jobs”, “mv”, “qdel”,

“type”, “unlink”, “what”, “zcat”

]

p stats(pangram) # => [9, 39, 13]

On 10/9/06, [email protected] [email protected] wrote:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

Isn’t it supposed to use every letter?

Martin

On Oct 9, 2006, at 11:15 AM, Martin C. wrote:

On 10/9/06, [email protected] [email protected] wrote:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

Isn’t it supposed to use every letter?

Which letters are missing?

$ ruby -e ‘puts((“a”…“z”).to_a - “zcat stty jobs cxref newgrp iconv
cksum qhold”.split(“”))’
$

James Edward G. II

Which letters are missing?

$ ruby -e ‘puts((“a”…“z”).to_a - “zcat stty jobs cxref newgrp iconv
cksum qhold”.split(""))’
$

Aha! So it does.

I’m just jealous that yours is smaller than mine.

Martin

I’m under the impression that it does? What letter am I missing?

-Cameron

[email protected] writes:

I haven’t found a solution with fewer repeated characters, but

“zcat stty jobs cxref newgrp iconv cksum qhold”

uses one fewer utility

That’s good to know, thank you.

On Fri, 06 Oct 2006 22:19:10 +0900, Ruby Q. wrote:

For maximum style points a pangram should read smoothly, and have both as few
repeated letters as possible (ideally zero), and as few words as possible.

This quiz extends the idea to the posix utilities[1] - write a program to find
pangrammatic collections of posix utilities that (1) use the fewest utilities
and (2) have the minimum number of repeated letters. In either case, break ties
on the other criterion; that is, your first solution should also have as few
repeated letters as possible, and your second one should use as few utilities as
possible.

[1] http://www.unix.org/version3/apis/cu.html has a complete list

Finding the pangram with the minimum number of words is NP-Hard. The
reduction is from Minimum Set Cover[1]:

MINIMUM SET COVER:
INSTANCE: Collection C of subsets of a finite set S.
SOLUTION: A set cover for S, i.e., a subset C’ of C such that
every element in S belongs to at least one member of C’.
MEASURE: Cardinality of the set cover, i.e., size(C’)

Reduction:
Each element in S becomes mapped to a unique letter in the alphabet used
by pangrams. (Note that in general, we can play Pangrams in any
langauage, the alphabet can be of length we want) Each subset K (which
is
an element of C) becomes a word, made up of the letters corresponding to
the elements of K. Finding a pangram with the minimum number of words in
this setup would give a solution to the set-cover instance.

I suspect that the other optimization problem here may also be NP-Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post) how to
do
a simple reduction to that problem though.

–Ken B.

[1]
http://www.ensta.fr/~diam//ro/online/viggo_wwwcompendium/node146.html

After a few optimization passes, I’ve finally figured out how to get
the optimal POSIX pangrams in about 30 minutes on a fast, modern
machine. I am still relatively new to Ruby so this application is
surely not as concise as it could be. To run it, save the two files,
create a posix-utilities.txt with the utilities, one per line, and run
the unit test.

– pangrams.rb

A set of routines to find minimal pangrams from an input set of

words. A pangram is a set of

words that contain all the letters of the alphabet. Unless there are

some properties of

pangrams that I am unaware of, finding the minimal pangrammatic

subset of a set of words

is in a family of constraint-satisfaction problems that is

NP-complete, as in, there is

not a way to find the minimal subset that is guaranteed to run in

polynomial time with respect

to the size of the dataset.

The most common ways of finding excellent, but not provably perfect

solutions are to either

use a non-deterministic algorithm, or to try to prune the number of

subsets that you search

as much as possible, and backtrack as often as possible, quitting

after a pre-determined

quality threshold has been reached.

The Pangrams class below implements both of these algorithms,

providing methods to yield a

specified number of random pangrams, or to do a backtracking search

to yield successively

better pangrams.

In order not to waste time trying non-pangrams, the strategy for

building pangrams is to

build a list by finding the least common letter not already there,

and then searching for pangrams

using the utilities containing the least-common missing letter. If at

any point the string

is longer or has more repeated characters than the known minimums

then we stop searching and

backtrack. So we abort early and try to limit the number of decisions

to try at each step.

The search algorithm does not run in polynomial time, but for 160

elements, it can search the

problem space in about 30 minutes on a fast, modern machine

(dual-core xeon). For larger datasets

(like aspell) the non-deterministic approach would be more practical.

Two minimal POSIX pangrams are:

Minimal number of utilities: "zcat stty jobs cxref newgrp iconv

cksum qhold"

Minimal repeated letters: “zcat tty jobs lex awk mv uniq chgrp df”

(4 repeats)

(This is one of the ones that Christian found, in a different

order)

Takes a list of words and creates a histogram of letters. The

histogram can then

be queried for pangramness and repeated letter counts.

class LetterHistogram

Initializes with a set of words

def initialize(words=nil)

@hist = Hash.new(0)

@total_letters = 0

words.each {|word| add word} unless words.nil?

end

Adds a word to the pangram

def add(word)

word.scan(/./) {|l| @hist[l] = @hist[l] + 1 if ('a'..'z') === l}

@total_letters += word.size

end

Returns true if this list of words has a pangram

def pangram?

return @hist.size == 26

end

Returns the number of repeated letter

def repeats

@total_letters - @hist.size

end

Returns a list of missing letters. Used in conjunction with the

WordLetterMap below to

find a good word to add to the list to make a pangram.

def missing_letters

missing = Array.new

('a'..'z').each {|l| missing << l if @hist[l] == 0}

missing

end

end

Contans a map of Letters => Words containing that letter

class WordLetterMap

def initialize(words)

@map = Hash.new

words.each {|w| add w}

end

Adds a new word

def add(word)

word.scan(/./) do |l|

  if ('a'..'z') === l

    @map[l] = Array.new unless @map.has_key? l

    @map[l] = @map[l] << word

  end

end

end

Return a list of words containing the least common missing letter

Used to limit the number of choices as we search the minimal

pangram space

def least_common(words=nil, histogram=nil)

histogram = LetterHistogram.new words if histogram.nil?

min_words = nil

histogram.missing_letters.each do |l|

  new_words = @map[l] - words

  if min_words.nil? || min_words.size > words.size

    min_words = new_words

  end

end



return min_words

end

end

Holds a list of words and generates minimal pangrams, both

non-deterministically,

and by searching, backtracking to avoid non-minimal branches

class Pangrams

Exception to throw when we have returned the maximum number of

pangrams

class AllDone < Exception

end

Initialize with a word list

def initialize(words=nil)

if words.nil?

  @words = Array.new

else

  @words = words

end

end

Adds a word to the set of words to produce pangrams

def add(word)

@words[@words.size] = word

end

The number of words in the set of words

def size

@words.size

end

Loads a list of words from a file

def self.from_file(filename)

p = Pangrams.new

File.new(filename).each_line {|line| p.add line.strip}

return p

end

Yields randomly generated minimal pangrams to the passed block

def random(count,&block)

@word_letters = WordLetterMap.new @words

(0..count).each {|i| random_pangram([],&block)}

end

Searches for good minimal pangrams, yielding the ones it finds to

the block

def search(max_count=0,&block)

@min_size = size

@min_repeats = 1000

@max_count = max_count

@count = 0

@word_letters = WordLetterMap.new @words

begin

  pangram_search([],&block)

rescue AllDone

  # Quit gracefully

end

end

private

searches the pangram space by finding all words containing the

least common letter until

a pangram has been built, passing each found pangram to the block.

The algorithm tries to

be smart searching the word space by choosing the least common

letters first, making the

overall space smaller. It also tries make smart use of backtracking

to avoid searching

non-lucrative branches.

def pangram_search(words, &block)

# Bust out if we've found enough pangrams

raise AllDone.new if @max_count != 0 && @count > @max_count



h = LetterHistogram.new words



# If we already have more words or more repeats, then no need to

look any

# further, we should backtrack and try something else.

return if words.size >= @min_size && h.repeats >= @min_repeats



# This pangram is somehow minimal, so pass to the block

if h.pangram?

  @min_size = words.size if words.size < @min_size

  @min_repeats = h.repeats if h.repeats < @min_repeats

  @count += 1

  yield words,h

  return

end



# No pangram yet, find children and descend

new_words = @word_letters.least_common words,h

new_words.each {|w| pangram_search words + [w], &block}

end

Builds a pangram by finding the least common letter that is not

represented in the

passed-in array, and recursing until a pangram is built. This

algorithm should build

minimal pangrams almost all of the time (i.e. removing a word from

the set will make

the word list non-pangrammatic and will yield very good pangrams,

but probably not the

most optimal ones.

def random_pangram(words, &block)

# Do we already have a pangram? If so, pass to block and quit

h = LetterHistogram.new words

if h.pangram?

  yield words,h

  return

end



# Be non-deterministic, and descend on a random word

new_words = @word_letters.least_common words,h

new_word = new_words[rand(new_words.size)]

random_pangram words + [new_word], &block

end

end

– test_pangrams.rb

require ‘test/unit’

require ‘pangrams’

Unit test harness to test the histogram and the frequency map code,

and also to exercise the pangram

generation code. The tests assume a list of posix words, one per

line, in ‘posix-words.txt’

class TestPangrams < Test::Unit::TestCase

def test_histogram

l = LetterHistogram.new ['moon']

assert_equal false, l.pangram?

assert_equal 1, l.repeats



pangram = %w{the quick brown fox jumps over the lazy dog}

l = LetterHistogram.new pangram

assert_equal true, l.pangram?



l = LetterHistogram.new ['qwertyuiopasdfghjklzxcvbn'] # m is

missing

m = l.missing_letters

assert_equal 1, m.size

assert_equal 'm', m.first

end

def test_random_search

puts "-- Testing Random Pangram Generation --"



p = Pangrams.from_file 'posix-words.txt'

min_repeats_length = p.size

min_repeats = 1000

min_repeats_pangram = nil



min_size = 160

min_size_repeats = 1000

min_size_pangram = nil



count = 0



p.random(1000) do |p, hist|

  repeats = hist.repeats

  size = p.size

  if repeats < min_repeats || repeats == min_repeats && size <

min_repeats_length

    min_repeats_pangram = p

    min_repeats = repeats

    min_repeats_length = p.size

    puts "New min-repeats pangram: #{min_repeats_pangram.join ' '}

with #{min_repeats} repeats"

    $stdout.flush

  end

  if size < min_size || size == min_size && repeats <

min_size_repeats

    min_size = size

    min_size_repeats = repeats

    min_size_pangram = p

    puts "New min-size pangram: #{min_size_pangram.join ' '} with

#{min_size} words"

    $stdout.flush

  end

end

end

def test_backtracking_search

puts "--Testing backtracking search--"



p = Pangrams.from_file 'posix-words.txt'

min_repeats_length = p.size

min_repeats = 1000

min_repeats_pangram = nil



min_size = 160

min_size_repeats = 1000

min_size_pangram = nil



p.search(50) do |p, hist|

  repeats = hist.repeats

  size = p.size

  if repeats < min_repeats || repeats == min_repeats && size <

min_repeats_length

    min_repeats_pangram = p

    min_repeats = repeats

    min_repeats_length = p.size

    puts "New min-repeats pangram: #{min_repeats_pangram.join ' '}

with #{min_repeats} repeats"

    $stdout.flush

  end

  if size < min_size || size == min_size && repeats <

min_size_repeats

    min_size = size

    min_size_repeats = repeats

    min_size_pangram = p

    puts "New min-size pangram: #{min_size_pangram.join ' '} with

#{min_size} words"

    $stdout.flush

  end

end

end

end

On Thu, 12 Oct 2006 17:06:26 +0900, Martin C. wrote:

I suspect that the other optimization problem here may also be NP-
Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

There’s also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

And if you do naive searches of the entire solution space, the
number of POSIX utilities involved here is not all that trivial.

I suspect that the other optimization problem here may also be NP-
Hard.
I’m not sure (in all of the 5 minutes that I’m writing this post)
how to do
a simple reduction to that problem though.

True, but in this case the set to cover was small enough to make it
computationally feasible without having to rewrite it in Fortran. Phew.

Martin

On Oct 12, 2006, at 10:25 AM, Ken B. wrote:

Phew.

Martin

There’s also the matter of program structure. When dealing with an
NP-Hard problem, you know that any solution geared to do a heuristic
search quickly is not guaranteed to find the minimum.

We haven’t talked a lot about this in the past, but I take a pretty
relaxed approach to tough quiz problems. Find a good heuristic and
get close. That’s fine with me.

To me, Ruby Q. is all about keeping the brain sharp. I think
programmers are often susceptible to the “That’s not possible!”
mentality and we have to fight against that. MJD once gave a neat
speech about how when Alchemy was as old as our profession is now,
they were still actively trying to turn lead into gold but we seem
content to bury ourselves in rules about what we can’t do.

Was this week’s quiz solvable beyond a shadow of a doubt in all
cases? No. Does that mean we shouldn’t do it? No way. The
solution I talked up in the summary finds a darn good pangram in
under a second! It’s within 7 characters of perfect. It maybe
guesswork, but it’s damn good guesswork.

I have a great respect for those who can think outside the box like
that. I think we should cultivate that skill in ourselves.

Even at work, I’m the company’s resident language geek, so I often
see a problem and launch into a detailed plan of how we can solve it
to death. Luckily, my two far more practical coworkers usually just
politely point out the simple fix that is plenty good enough for our
purposes. If it wasn’t for them I would never accomplish a complete
anything.

I’m all about thinking outside the box. I think we should all learn
to love the guesswork!

There’s my two cents on this issue.

James Edward G. II