-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- The three rules of Ruby Quiz: 1. Please do not post any solutions or spoiler discussion for this quiz until 48 hours have elapsed from the time this message was sent. 2. Support Ruby Quiz by submitting ideas and responses as often as you can! Visit: <http://rubyquiz.strd6.com/suggestions> 3. Enjoy! Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone on Ruby Talk follow the discussion. Please reply to the original quiz message, if you can. RSS Feed: http://rubyquiz.strd6.com/quizzes.rss -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ## Encyclopedia Construction (#205) This week's quiz comes from Tim Hunter. Thank you Tim for the great write up! When I was a kid we had an encyclopedia. It had thousands of articles in alphabetical order, divided into about 25 volumes. Sometimes there were so many articles with the same initial letter ("M", for example) that an entire volume was dedicated to articles with that initial letter. In some cases, though, there weren't enough articles with the same initial letter to devote an entire volume to them, so those articles were grouped in with alphabetically adjacent articles in the same volume. For example, all the articles starting with W, X, Y, and Z were in the same volume. Of course, all the articles that started with the same initial letter were always in the same volume. Each volume was labeled with the initial character of the articles it contained. For example, the "M" volume was simply labeled "M". If the articles were spread over more than one initial character, the label displayed the first and last characters in the range. The W, X, Y, and Z volume was labeled "W-Z". The quiz is to devise a way of grouping a list of encyclopedia articles into volumes. Since this is Ruby Quiz, we'll say that for "articles" we have a list of about 100 words composed of alphabetical characters, a-z. A "volume" is an array. Sort the list and then put the words into arrays using the following rules. 1. All the words with the same initial character must be in the same array. 2. If an array contains words with different initial characters, the words must be alphabetically adjacent to each other. That is, if "cat", "dog", and "elephant", are in the list of articles, and you put "cat" and "elephant" in the same array, then "dog" must also be in that array. 3. Without breaking rules 1 and 2, divide the words into about 10 arrays. 4. Without breaking rules 1 and 2, each of the arrays should contain about the same number of words. To display the encyclopedia, make a hash with one element for each array. The element value is the array. The element key is a single letter ("M") when the array only contains words with the same initial character, and a range ("W-Z") when the array contains words with more than one initial character. Print the hash with the "pp" command. Here's the list of words: ape apple apricot aquamarine architect artichoke azure baboon badger banana bat battleship beeswax beetles black blogger blue bricklayer Brigadoon card cat cherry chokeberry coconut coral crimson currant dam debate deliberations demagogue dog durian elephant empathy encyclopedia fig fox gazelle giraffe goldenrod gray hippopotamus huckleberry ibex imprint indigo ivory jab jackrabbit jake khaki kiwi kudzu lavender lemming leopard lime loganberry mango maroon memory mole monkey mouse muskox Nigeria Navajo olive opera panther party peach pear penguin persimmon plum poet politician privy programmer rabbit red research rhino Rumpelstiltskin salmon silver snow soldier songsmith squirrel starship steel strawberry student tan tapir theater thesaurus tree turquoise umbrella warthog waterloo wildebeest wolf woodchuck xylophone yahoo yellow zebra
on 15.05.2009 18:40
on 18.05.2009 17:41
Hi,
here's my solution. It's based on classic dynamic programming over the
number of words for each letter.
I used different objective functions to get good book sizes:
1. Maximize the minimal number of words in one book,
2. minimize the maximal number of words in one book,
3. minimize the absolute deviation of the number of words in one
book to the average number of words in one book ,
4. minimize the quadratic deviation of the number of words in one
book to the average number of words in one book.
For the given word list, objective function 4 seems to yield the best
results with good distributed word counts.
The program is called with three arguments:
1. The name of the word-list file (one word per line),
2. the number of books in which the words should be distributed,
3. a number 1,2,3 or 4 selecting the objective function.
The programm prints
1. the number of words per letter,
2. the words per book,
3. the number of words per book.
The current implementation is quite inelegant and could be improved
and be more rubyish.
---
require 'facets/enumerable'
require 'enumerator'
require 'pp'
word_file = ARGV.shift || "words.txt"
nbooks = ARGV.shift.to_i || 10
algorithm = ARGV.shift.to_i || 3
# ruby 1.8
class Integer
def ord; self; end
end
class Wordlist
attr_reader :books
def initialize( words )
# sort them
@words = words.sort {|w1, w2| w1.downcase <=> w2.downcase }
# get number of words per letter
@nletters = Array.new(26, 0)
@words.each do |w|
@nletters[w.downcase[0].ord - ?a.ord] += 1
end
end
# Returns the number of words per character
def letter_counts
(?A .. ?Z).mash{|c| [c.chr, @nletters[c.ord - ?A.ord]]}
end
# Returns number of words in each book
def counts
@books.mash{|range, words| [range, words.size]}
end
# divide into +nbooks+ books by minimizing the maximal number
# of words in one book
def min_maximum( nbooks )
dyn_min( nbooks ) { |*s| s.max }
end
# divide into +nbooks+ books by maximizing the minmal number
# of words in one book
def max_minimum( nbooks )
dyn_min( nbooks ) { |*s| -s.min }
end
# divide into +nbooks+ books by minimizing the deviation from the
# average number of words in one book
def min_deviat( nbooks )
mean = sum( 0...@nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |*s| s.inject(0){|sum,x| sum + (x -
mean).abs} }
end
# divide into +nbooks+ books by minimizing the quadratic deviation
# from the average number of words in one book
def min_deviat2( nbooks )
mean = sum( 0...@nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |*s| s.inject(0){|sum,x| sum + (x - mean) **
2 } }
end
private
# computes
# sum_{i\in range} @nletters[i]
def sum( range )
range.inject(0) {|s,i| @nletters[i] + s}
end
# A range of letters in the same book.
class Book
def initialize( range, n_words )
@range = range
@n_words = n_words
end
# The first letter in this book
def min; @range.min; end
# The last letter in this book
def max; @range.max; end
# Is letter x in this book?
def include?(x); @range.include?(x); end
# returns the number of all words in this book
attr_reader :n_words
end
# Computes a solution where
# max{ func(part_sum) }
# is minimized
def dyn_min( nbooks, &func )
mean = sum( 0...@nletters.size ).to_f / nbooks
books = Array.new( @nletters.size )
s = 0
(@nletters.size - 1).downto(0) do |i|
s += @nletters[i]
range = i ... @nletters.size
books[i] = [Book.new( range, sum(range) )]
end
nbooks.times do
new_books = Array.new( @nletters.size )
for s in 0 ... @nletters.size
best_value = nil
best_end = nil
sum = @nletters[s] # value of the current part
for e in (s + 1) ... @nletters.size
# stop if no further subdivisions are possible
break unless books[e]
value = func.call(sum, *books[e].map{|r| r.n_words})
if best_value.nil? || value < best_value
best_value = value
best_end = e
end
sum += @nletters[e] || 0
end
if best_value
new_books[s] = [Book.new(s ... best_end,
sum(s ... best_end))] +
books[best_end]
end
end
books = new_books
end
@books = Hash.new
books[0].each do |book|
words = @words.find_all{|w|
book.include?(w.downcase[0].ord - ?a.ord)
}
if book.min == book.max
key = (?A.ord + book.min).chr
else
key = "#{(?A.ord + book.min).chr}-#{(?A.ord + book.max).chr}"
end
@books[key] = words
end
@books
end
end
# read word list
words = []
File.open( word_file, "r" ) do |f|
f.each_line do |line|
line.strip!
words << line unless line.empty?
end
end
list = Wordlist.new( words )
p list.letter_counts
case algorithm
when 1
list.min_maximum( nbooks )
when 2
list.max_minimum( nbooks )
when 3
list.min_deviat( nbooks )
when 4
list.min_deviat2( nbooks )
else
raise "Unknown algorithm: #{algorithm} (should be in {0,1,2,3})"
end
pp list.books
p list.counts
---
Bye,
Frank
on 19.05.2009 06:53
Here's my solution: http://www.pastie.org/482422 Usage: e = Encyclopedia.new(articles) e.volumes(20) #<-- how many volumes should it be condensed to? pp e.hash puts e.dump #<-- this will print a nice little chart to help you visualize the distribution of articles. My strategy was to: -divide each group of words starting with the same letter into it's own array -if I had to reduce the number of volumes: -combine 2 adjacent letter arrays into a single array based on which 2 arrays have the smallest combined length -repeat this procedure until the desired number of volumes is achieved Notes: -This algorithm can result in poorly balanced distribution of volumes because it glues together adjacent volumes permanently. A more even distribution could be achieved if you could break a volume off after it had already been combined. -Certain number of volumes will result in really even distribution. 17 with the provided articles worked out well. -I think it would be interesting to split larger volumes as well. I think in some encyclopedias you'll find volumes with more common words split (eg. Ma - Me and Mi -Mz) My code could be much improved - quick and dirty. Well, maybe not so much quick, but dirty. Luke
on 20.05.2009 09:30
On 2009-05-18, Frank Fischer <frank-fischer@shadow-soft.de> wrote:
Here's a second version, this time returning the correct number of books
(the first version returned the requested number of books + 1). And it
should be a little cleaner and faster.
Frank
--
require 'facets/enumerable'
require 'enumerator'
require 'pp'
word_file = ARGV.shift || "words.txt"
nbooks = (ARGV.shift || 10).to_i
algorithm = (ARGV.shift || 4).to_i
# ruby 1.8
if RUBY_VERSION[0,4] == "1.8."
class Integer
def ord; self; end
end
end
# A book containing the words starting with a letter in first .. last
class Book
attr_reader :first, :last, :n_words, :words
# Create a book with n words whose first letter is in range.
def initialize( range, n_words )
@range = range.min.ord .. range.max.ord
@n_words = n_words
end
# Set the words in this book
def words=( words )
@words = words.find_all{|w| include?(w.downcase[0])}
@n_words = @words.size
end
# Returns true if the words starting with the given letter are in
# this book
def include?( letter )
@range.include?( letter.ord - ?a.ord )
end
# Returns the character-range.
def range
if @range.min == @range.max
(?A.ord + @range.min).chr
else
(?A.ord + @range.min).chr + "-" + (?A.ord + @range.max).chr
end
end
def to_s; "<Book: %3s, size: #@n_words>" % range; end
def inspect; to_s; end
end
class Wordlist
attr_reader :books
def initialize( words )
# sort them
@words = words.sort {|w1, w2| w1.downcase <=> w2.downcase }
# get number of words per letter
@nletters = Array.new( 26, 0 )
@words.each do |w|
@nletters[w.downcase[0].ord - ?a.ord] += 1
end
end
# Returns the number of words per character
def letter_counts
(0...@nletters.size).mash{|i| [(?a.ord + i).chr, @nletters[i]]}
end
# divide into +nbooks+ books by minimizing the maximal number
# of words in one book
def min_maximum( nbooks )
dyn_min( nbooks ) { |s, rest| [s, rest || 0].max }
end
# divide into +nbooks+ books by maximizing the minmal number
# of words in one book
def max_minimum( nbooks )
dyn_min( nbooks ) { |s, rest| [-s, (rest || -s)].max }
end
# divide into +nbooks+ books by minimizing the deviation from the
# average number of words in one book
def min_deviat( nbooks )
mean = sum( 0 ... @nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |s, rest| (rest || 0) + (s - mean).abs }
end
# divide into +nbooks+ books by minimizing the quadratic deviation
# from the average number of words in one book
def min_deviat2( nbooks )
mean = sum( 0 ... @nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |s, rest| (rest || 0) + (s - mean) ** 2 }
end
private
# computes
# sum_{i\in range} @nletters[i]
def sum( range )
range.inject(0) {|s,i| @nletters[i] + s}
end
# Computes a solution minimizing the objective function.
def dyn_min( nbooks, &func )
# create initial books (1 book)
books = Array.new( @nletters.size )
s = 0
(@nletters.size - 1).downto(0) do |i|
s += @nletters[i]
range = i ... @nletters.size
sum = sum(range)
books[i] = [[Book.new( range, sum )], func.call(sum, nil)]
end
(nbooks - 1).times do
books = (0 ... @nletters.size).map { |s|
best_value = nil
best_end = nil
sum = @nletters[s] # value of the current part
for e in s.next ... @nletters.size
if books[e]
value = func.call(sum, books[e][1])
if best_value.nil? || value < best_value
best_value = value
best_end = e
end
end
sum += @nletters[e] || 0
end
if best_value
[[Book.new(s...best_end, sum(s...best_end))] +
books[best_end][0],
best_value]
else
nil
end
}
end
@books = books[0][0]
# collect the words into the books
@books.each do |book|
book.words = @words
end
end
end
# read word list
words = []
File.open( word_file, "r" ) do |f|
f.each_line do |line|
line.strip!
words << line unless line.empty?
end
end
list = Wordlist.new( words )
p list.letter_counts
puts "Number of words: #{list.letter_counts.values.inject(0){|s,x| s +
x}}"
case algorithm
when 1
list.min_maximum( nbooks )
when 2
list.max_minimum( nbooks )
when 3
list.min_deviat( nbooks )
when 4
list.min_deviat2( nbooks )
else
raise "Unknown algorithm: #{algorithm} (should be in {1,2,3,4})"
end
pp list.books.mash{|book| [book.range, book.words]}
p list.books.mash{|book| [book.range, book.n_words]}
on 20.05.2009 17:28
I updated my solution too. No real improvements to the algorithm, but the code is more 'rubyish'. For good or bad, I found it useful to subclass Array. http://www.pastie.org/484220 Luke
on 24.05.2009 20:32
There were three submitted solutions to this week's quiz. Srijayanth
Sridhar provided the following:
#! /usr/bin/ruby
require 'pp'
words=File.readlines("../words.txt").map { |word|
word.chomp.capitalize }
dict = {}
words.each do |word|
dict[word[0..0]] ||= []
dict[word[0..0]] << word
end
count=0
sub_array = []
main_dict = {}
dict.keys.sort.each do |alphawords|
sub_array += dict[alphawords]
count += sub_array.flatten.size
if ( count >= 10 )
alpha=sub_array.flatten.map { |x| x[0..0] }.uniq
alpha.size > 1 ? key=Range.new(alpha.min,alpha.max) : alpha[0]
main_dict[key] = sub_array.flatten
count = 0
sub_array=[]
end
end
# Attach last remaining words
alpha=sub_array.flatten.map { |x| x[0..0] }.uniq
alpha.size > 1 ? key=Range.new(alpha.min,alpha.max) : alpha[0]
main_dict[key] = sub_array.flatten
count = 0
sub_array=[]
pp main_dict
Srijayanth's solution works by building a dictionary of all the words
grouped by first letter. Starting from the beginning the words in the
next letter are added until the total number of words in the current
volume is greater than or equal to `count` (10 by default).
The advantage of this solution is that it is simple to implement and
only needs to make one pass through the lists of words. One potential
weakness is that if the words in the second to last letter added and
it goes above the threshold, then the words beginning with the last
letter won't have any possibility to be merged in and could be left as
a very small volume.
Luke Cowell's solution repeatedly merges the smallest adjacent pairs
until the requested number of volumes is reached. This is a simpler
technique and produces pretty good results as long as the input data
is well behaved.
Frank Fischer's solution uses dynamic programming[1]. Dynamic
programming is a method of solving complex problems by breaking them
down into simpler steps. There are two key attributes that a problem
must have in order for dynamic programming to be applicable: optimal
substructure and overlapping subproblems. Optimal substructure means
that the solution to a given optimization problem can be obtained by
the combination of optimal solutions to its subproblems. Overlapping
subproblems means that the space of subproblems must be small, that
is, any recursive algorithm solving the problem should solve the same
subproblems over and over, rather than generating new subproblems.
Frank's solution provides different methods to optimize the
arrangement of the volumes.
# divide into +nbooks+ books by minimizing the maximal number
# of words in one book
def min_maximum( nbooks )
dyn_min( nbooks ) { |s, rest| [s, rest || 0].max }
end
#...
# divide into +nbooks+ books by minimizing the quadratic deviation
# from the average number of words in one book
def min_deviat2( nbooks )
mean = sum( 0 ... @nletters.size ).to_f / nbooks
dyn_min( nbooks ) { |s, rest| (rest || 0) + (s - mean) ** 2 }
end
Each of these methods passes a different block to the `dyn_min` method
depending on what objective you wish to optimize. The `dyn_min` method
iterates through the possible solutions and stores the best
combinations of volumes according to the metric passed in.
Thank you Srijayanth, Frank, and Luke for your contributions this week.
P.S. From now on the summaries will have an attached archive of the
quiz solutions.
[1]: http://en.wikipedia.org/wiki/Dynamic_programming