Forum: Ruby Found Senryu (#224)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
33117162fff8a9cf50544a604f60c045?d=identicon&s=25 Daniel X Moore (yahivin)
on 2009-11-13 20:16
(Received via mailing list)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1.  Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2.  Support Ruby Quiz by submitting ideas and responses
as often as you can.

3.  Enjoy!

Suggestion:  A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion.  Please reply to
the original quiz message, if you can.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

Suggestions?: http://rubyquiz.strd6.com/suggestions

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Found Senryu (#224)

Ya'atay Rubyists,

This week's quiz comes from Martin DeMello:

Scan a text for runs
of seventeen syllables
formed from complete words

Have Fun!
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-11-17 12:16
(Received via mailing list)
On Sat, Nov 14, 2009 at 12:45 AM, Daniel Moore <yahivin@gmail.com>
wrote:
>
> Scan a text for runs
> of seventeen syllables
> formed from complete words

Sorry, this was underspecified (I was hoping that the example would
suffice, and I needn't break the purity of the quiz post :)). The
seventeen syllables should, for full correctness, also be able to be
broken into 5/7/5 in traditional "internet haiku" fashion.

martin
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2009-11-17 12:59
(Received via mailing list)
Attachment: senryu.rb (980 Bytes)
Here's my solution (code both inlined below and attached). It uses the
cmu pronouncing dictionary directly, since I didn't know about
Lingua::EN. The code is inefficient in theory, but runs fast enough :)

Since there were way too many results (most of them uninteresting), I
added the refinement that the senryu should be a complete sentence
(the word just before it, and the last word, should both end with a
full stop). I used Wodehouse's "Right Ho, Jeeves"
[http://www.gutenberg.org/etext/10554] to test it; these are three of
the more amusing ones it found:

You can't get away
from the facts. Somebody stole
her from me at Cannes.

"How? Could Lloyd George do
it, could Winston do it, could
Baldwin do it? No.

"What?" I staggered, and
the left pedal came up and
caught me on the shin.


#
-------------------------------------------------------------------------------------------------------------------------------
# get dictionary from http://www.speech.cs.cmu.edu/cgi-bin/cmudict

text = IO.read(ARGV[0]).split(" ")
dict = IO.readlines('cmudict.0.7a')
$syllables = {}

dict.each {|line|
  word, pron = line.split(/\s+/, 2)
  syl = pron.gsub(/[^\d]/, '').length
  $syllables[word] = syl
}

senryu = []
count = 0
prev = ''

text.each do |word|
  w = word.upcase.gsub(/\W/, '')
  syl = $syllables[w]
  if syl
    senryu.push([word, syl])
    count += syl
    if count == 17
      five = twelve = nil
      n = 0
      senryu.each_with_index {|(w, i), ix|
        n += i
        five = ix if (n == 5)
        twelve = ix if (n == 12)
      }
      if five and twelve and senryu[-1][0] =~ /\.$/ and prev =~ /\.$/
        [0..five, (five+1)..twelve, (twelve+1)..-1].each {|i|
          puts senryu[i].map(&:first).join(" ")
        }
        puts
      end
    end
    if count >= 17
      prev, i = senryu.shift
      count -= i
    end
  else
    senryu = []
    count = 0
  end
end
33117162fff8a9cf50544a604f60c045?d=identicon&s=25 Daniel X Moore (yahivin)
on 2010-03-10 05:41
(Received via mailing list)
The quiz this week has solutions from Martin DeMello and wkm.

wkm's solution uses the Lingua::EN package[1] to count syllables.
Installing Lingua::EN was somewhat of a challenge. I was only able to
install it without the dictionary and run using the guessing library,
hopefully others may have better luck.

When the program runs the first step is to read in the documents,
extracting the words and syllable counts. Whenever a word's syllables
are looked up the results are cached to save time on looking up common
words. Next, this list of words with their syllable counts is iterated
over with all possible word offsets to find runs of 17 syllables. When
such a run is found the words that comprise it are printed out.

There is, however, one issue with wkm's solution. It does not check to
see if the 17 syllables split on word boundaries into 5-7-5 syllable
chunks. This causes the program to greatly over-estimate how many 17
syllable runs there are in the text.

Martin's solution uses uses the cmu pronouncing dictionary directly.
The entire dictionary is loaded so that words can be looked up easily.
The text is then iterated through, counting up the number of
syllables. When a run of words totals 17 syllables it is checked to
see that the fifth and twelfth syllable boundaries are also on word
boundaries.

If the number of syllables for a section is greater than 17 the first
word in the section is removed and its syllables removed from the
count. This allows checking of all possible runs of 17 syllables
without having to iterate through the text multiple times. When
running Martin's solution against Wodehouse's "Right Ho, Jeeves" it
produces many good excerpts, but when run solely against those
excerpts does not print them out. I haven't been able to find the
cause of this issue, but as they say, let's leave that as an exercise
to the reader.

Thank you Martin and wkm for your solutions to this week's quiz!

Found Senryu (#224) - Solutions[2]

[1]: http://www.pressure.to/ruby/
[2]: http://rubyquiz.strd6.com/quizzes/224.tar.gz
This topic is locked and can not be replied to.