Found Senryu (#224)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this
    quiz until 48 hours have elapsed from the time this message was
    sent.

  2. Support Ruby Q. by submitting ideas and responses
    as often as you can.

  3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby T. follow the discussion. Please reply to
the original quiz message, if you can.


RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

Suggestions?: http://rubyquiz.strd6.com/suggestions

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Found Senryu (#224)

Ya’atay Rubyists,

This week’s quiz comes from Martin DeMello:

Scan a text for runs
of seventeen syllables
formed from complete words

Have Fun!

On Sat, Nov 14, 2009 at 12:45 AM, Daniel M. [email protected]
wrote:

Scan a text for runs
of seventeen syllables
formed from complete words

Sorry, this was underspecified (I was hoping that the example would
suffice, and I needn’t break the purity of the quiz post :)). The
seventeen syllables should, for full correctness, also be able to be
broken into 5/7/5 in traditional “internet haiku” fashion.

martin

The quiz this week has solutions from Martin DeMello and wkm.

wkm’s solution uses the Lingua::EN package1 to count syllables.
Installing Lingua::EN was somewhat of a challenge. I was only able to
install it without the dictionary and run using the guessing library,
hopefully others may have better luck.

When the program runs the first step is to read in the documents,
extracting the words and syllable counts. Whenever a word’s syllables
are looked up the results are cached to save time on looking up common
words. Next, this list of words with their syllable counts is iterated
over with all possible word offsets to find runs of 17 syllables. When
such a run is found the words that comprise it are printed out.

There is, however, one issue with wkm’s solution. It does not check to
see if the 17 syllables split on word boundaries into 5-7-5 syllable
chunks. This causes the program to greatly over-estimate how many 17
syllable runs there are in the text.

Martin’s solution uses uses the cmu pronouncing dictionary directly.
The entire dictionary is loaded so that words can be looked up easily.
The text is then iterated through, counting up the number of
syllables. When a run of words totals 17 syllables it is checked to
see that the fifth and twelfth syllable boundaries are also on word
boundaries.

If the number of syllables for a section is greater than 17 the first
word in the section is removed and its syllables removed from the
count. This allows checking of all possible runs of 17 syllables
without having to iterate through the text multiple times. When
running Martin’s solution against Wodehouse’s “Right Ho, Jeeves” it
produces many good excerpts, but when run solely against those
excerpts does not print them out. I haven’t been able to find the
cause of this issue, but as they say, let’s leave that as an exercise
to the reader.

Thank you Martin and wkm for your solutions to this week’s quiz!

Found Senryu (#224) - Solutions2

Here’s my solution (code both inlined below and attached). It uses the
cmu pronouncing dictionary directly, since I didn’t know about
Lingua::EN. The code is inefficient in theory, but runs fast enough :slight_smile:

Since there were way too many results (most of them uninteresting), I
added the refinement that the senryu should be a complete sentence
(the word just before it, and the last word, should both end with a
full stop). I used Wodehouse’s “Right Ho, Jeeves”
[Right Ho, Jeeves by P. G. Wodehouse - Free Ebook] to test it; these are three of
the more amusing ones it found:

You can’t get away
from the facts. Somebody stole
her from me at Cannes.

"How? Could Lloyd George do
it, could Winston do it, could
Baldwin do it? No.

“What?” I staggered, and
the left pedal came up and
caught me on the shin.


get dictionary from The CMU Pronouncing Dictionary

text = IO.read(ARGV[0]).split(" ")
dict = IO.readlines(‘cmudict.0.7a’)
$syllables = {}

dict.each {|line|
word, pron = line.split(/\s+/, 2)
syl = pron.gsub(/[^\d]/, ‘’).length
$syllables[word] = syl
}

senryu = []
count = 0
prev = ‘’

text.each do |word|
w = word.upcase.gsub(/\W/, ‘’)
syl = $syllables[w]
if syl
senryu.push([word, syl])
count += syl
if count == 17
five = twelve = nil
n = 0
senryu.each_with_index {|(w, i), ix|
n += i
five = ix if (n == 5)
twelve = ix if (n == 12)
}
if five and twelve and senryu[-1][0] =~ /.$/ and prev =~ /.$/
[0…five, (five+1)…twelve, (twelve+1)…-1].each {|i|
puts senryu[i].map(&:first).join(" ")
}
puts
end
end
if count >= 17
prev, i = senryu.shift
count -= i
end
else
senryu = []
count = 0
end
end