The three rules of Ruby Quiz: 1. Please do not post any solutions or spoiler discussion for this quiz until 48 hours have passed from the time on this message. 2. Support Ruby Quiz by submitting ideas as often as you can: http://www.rubyquiz.com/ 3. Enjoy! Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone on Ruby Talk follow the discussion. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= This week's Ruby Quiz is about text generation. That's right, we're going to teach your computer to weave tall tales. At its most basic level, a solution might be: >> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join => "fb mcr hhluesjbhtf swm eehokmi" However, let's make our goal to get as close to English looking sentences as possible. One way you might do this is using a technique called Markov Chains. To use Markov Chains, you read some text document(s), making note of which characters commonly follow which characters or which words commonly follow other words (it works for either scale). Then, when generating text, you just select a character or word to output, based on the characters or words that came before it. The number of previous characters or words considered is called the "order" and you can adjust that to try and find a natural feel. For example, here is some generated text using a second order word chain derived from the Sherlock Holmes novel "The Hound of the Baskervilles" by Arthur Conan Doyle: The stars shone cold and bright, while a crushing weight of responsibility from my shoulders. Suddenly my thoughts with sadness. Then on the lady's face. "What can I assist you?" If you need text's to prime your program with, I suggest searching Project Gutenberg: http://www.gutenberg.org/
on 07.04.2006 15:04
on 07.04.2006 15:56
I'm pumped! I've been reading this mailing list for the last 3 months and I finally feel ready to try out a quiz. This one sounds fun!
on 07.04.2006 16:21
On Apr 7, 2006, at 8:53 AM, Charlie Bowman wrote: > I'm pumped! I've been reading this mailing list for the last 3 months > and I finally feel ready to try out a quiz. This one sounds fun! I'm glad. These can be quite a bit of fun, depending on what text you prime them with... ;) James Edward Gray II
on 07.04.2006 17:07
On Fri, Apr 07, 2006 at 10:02:28PM +0900, Ruby Quiz wrote:
> possible. One way you might do this is using a technique called Markov Chains.
I think it would be really neat to apply this to music composition. I'm
just not sure how you would get a corpus for that (maybe parsing
lilypond files?).
--Aaron
on 07.04.2006 17:16
or ascii based tab files for us banjo pickers!
on 07.04.2006 17:43
On 4/7/06 10:14 AM, "Charlie Bowman" <charlie@castlebranch.com> wrote: > or ascii based tab files for us banjo pickers! > Banjo? I'm pretty sure he said "music", so I'm not sure why you are bringing up banjos. :-) What's the difference between a banjo and a(n)? Chain Saw: 1. a chain saw has a dynamic range. 2. you can turn a chain saw off. 3. South American Macaw: one is loud, obnoxious, and noisy; and the other is a bird. 4. Harley Davidson Motorcycle: you can tune a Harley. 5. Onion: no one cries when you cut up a banjo. 6. Trampoline: you take your shoes off to jump on a trampoline. 7. Uzi: an uzi only repeats forty times. Sorry - couldn't help myself on this one :-) Keith
on 07.04.2006 18:42
I would like Ruby support of serial ports for the following three platforms. #1 Windows #2 XP #3 Mac OS X Please respond offline if you can show me how to add any or all of these Gus rubySPS@oh-bear.com
on 07.04.2006 18:48
> I would like Ruby support of serial ports for the following > three platforms. > #1 Windows > #2 XP > #3 Mac OS X > Can you differentiate between/clarify #1 and #2? -M
on 07.04.2006 19:37
Banjo players spend half their lives tuning and the other half playing out of tune. Banjos are definitely an acquired taste. I think I've figured out why I love them so much. They are as close instrument to a digital signal as you can get (other than a drum). The sound is so staccato and you pluck so many notes in a second that it is very near to being digital :)
on 07.04.2006 20:11
On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote: [snip] > These can be quite a bit of fun, depending on what text you prime > them with... ;) input_text := ruby_talk; (* hehe *)
on 07.04.2006 21:12
On 4/7/06, Simon Strandgaard <neoneye@gmail.com> wrote: > On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote: > [snip] > > These can be quite a bit of fun, depending on what text you prime > > them with... ;) > > input_text := ruby_talk; (* hehe *) I've been considering collecting all the Ilias posts (filtering out other people's comments that are quoted, for purity) and seeing what it prints out. ;) Jacob Fugal
on 08.04.2006 00:18
Ruby Quiz wrote: > The stars shone cold and bright, while a crushing weight of responsibility > from my shoulders. Suddenly my thoughts with sadness. Then on the lady's > face. "What can I assist you?" > I can't tell if this is a bug, or success beyond my wildest dreams: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ She wore handily the woven craft of lips. Besotted! My anglican parrot wenches freely. Three dozen Iceland begin to unwind. Hope. Charity. My outrage cannot succumb to rainswept chance. Now chance he the broadly winding spar. This stock is on the move! You'll never fail to satisfy her again! Home loans from 0.025%! V t A G R A from $ 3.50 per item C 1 @ L I S from $3.75 per item V Ax L 1 U M from $1.20 per item Premier PARMACY - your c e r t i f i e d online parmacy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I took usenet for my corpus, then used a genetic algorithm with SpamAssassin for scoring. And, uh, I wrote it 6 days ago.... ;-) -dB
on 08.04.2006 00:45
On Apr 7, 2006, at 10:18 AM, James Edward Gray II wrote:
> James Edward Gray II
Indeed, I just finished my solution (first Ruby Quiz, I've done, yay)
and it's all sorts of fun
on 08.04.2006 01:03
On Sat, 2006-04-08 at 03:07 +0900, Simon Strandgaard wrote: > On 4/7/06, James Edward Gray II <james@grayproductions.net> wrote: > [snip] > > These can be quite a bit of fun, depending on what text you prime > > them with... ;) > > > input_text := ruby_talk; (* hehe *) I had to try this out :) With 500 messages from the past couple of days, order of 3, I found lots of cryptic wisdom when using a small word limit, such as: === windows :\ i guess clarity, just isn't a interesting. thanks for making def say_stuff puts bereflected to the foo to design a gui by open source path. in summary, to get wrote: pointers, or care what you mean. example, c# guys (including even worse - perl. warning: pointer targets in boese wrote: use an i'll go digging +0900, james edward gray would help with reading v_mod/1 m = 'm' to figure this out? tashiro wrote: you need own license, possibly april fool's joke. v.- interface and doscommand which next implemented anobjective-c frontend set of programming === I've attached the quick and dirty script I used to get cleaned-up input text from ruby-talk archive. This is a seriously cool quiz :) Thanks all concerned.
on 08.04.2006 01:22
On Apr 7, 2006, at 7:02 PM, Ross Bamford wrote: > days, > in summary, to get wrote: pointers, or care what you mean. > anobjective-c frontend set of programming > <rubytalk2text.rb> Thanks for the script, I note this interesting result with my implementation: % ruby markov.rb rt.text 12 Alternate austin austin austin austin halostatue gmail com alternate austin austin austin
on 08.04.2006 04:51
I went and grabbed a bunch of Grimm's fairy tales from Gutenberg. I still need to tweak my algorithm some; the most annoying thing I find is punctuation, especially quoted strings (i.e. speech) so that the output text looks like there is some valid speech in there and not just randomly scattered quote marks. Anyway, here's a sample I produced with my first attempt: There was a man who kills seven at one blow? I leapt over the tree because the huntsmen are shooting down there in the closet on the porch.' The miller said: 'The Devil must go out,' and opened the door and opened it. As she went in, a little dwarf no bigger than my finger. And before her stood princes, and dukes, and earls: and the fisherman went up to her palace all of shining gold; and told her mother that she must be the handsomest lady in the land; and she went to the fire and growled contentedly and comfortably. It was not long in saying 'Yes' to all this; and as they were entering the village, the son followed the fox's counsel, and without looking about him went to the other anvil. The old man led him back into the water. The girls came just in time; they held him fast and tried to free his beard from the line, but all in vain; he heard or saw nothing of Jorinda. At last he thought to himself: 'How the old woman is snoring! I must just see if she wants anything.' and: An aged count once lived in Switzerland, who had an only son, but he was now grown too old to work; so the farmer would give him his only daughter to her bed-side, and said, 'Always be a good girl, and I will never by myself leave the path, to run into the wood, till at last they danced out at the gate together. When they had walked a short time, when they had warmed themselves, they said: 'Comrade, shall we have a child, and he grows big, and we send him into the wide world, and looked neither to the right place. There they found the giants swimming in their blood, and all round about her, and the bells rang at each step which she took. Then she was in a great hurry. The little tailor looked round and saw a flock of sheep, the very shepherd whom the peasant knew had long been wishing to be mayor, so he cried with all his might.
on 08.04.2006 18:42
Correction to earlier post AGS Calabrese 00000000000000000000000000000000000000000000000000 I would like Ruby support of serial ports for the following three platforms. #1 Windows #2 Linux #3 Mac OS X Tiger Please respond offline if you can show me how to add serial support to any or all of these Gus rubySPS@oh-bear.com my spam filter has been eating messages so I may miss your message. I am working on the problem 000000000000000000000000000000000000000000000000000 On 2006-Apr 07, at 10:46 AM, Mike wrote: > I would like Ruby support of serial ports for the following > three platforms. > #1 Windows > #2 XP > #3 Mac OS X > Can you differentiate between/clarify #1 and #2? -M
on 09.04.2006 15:47
Hi all- Here is my first attempt at a Ruby Quiz (attached file: 'markovtext.rb'). I'm new to Ruby, mostly paying the bills over the past years with Perl work. This quiz was fun, and I learned a lot from doing this. Coming from Perl, I often rely upon auto- vivification, so I needed to figure out how to work around this. Perhaps there are improved ways of going about this, and I'd appreciate any feedback on how to go about it better. My solution is fairly straight forward, using a second order word chain, which I imagine many others will be implementing in similar ways. As noted in the comments, I only start making my paragraph on the second sentence, in order to get greater diversity in how the paragraph starts. I'm also striping out some of the punctuation from the end result, as some texts give orphaned punctuation marks. I had the most fun from the quiz running my script against various texts, and seeing the results. My favorite has been getting results from "Huckleberry Finn", though the grammar gets a bit off at times, but that relates to the input text. For comparison, one can see much more typical sentence structure from "Journey to the Center of the Earth". Example results: == Huck Finn via 'markovtext.rb': And above all, don't you try to run in the dark he says: Yes, Mars Sid, A dog. Cur'us dog, too. Does you want it, you can go and write poetry after them out loud. One bill said, The celebrated Dr. Armand de Montalban, of Paris, would lecture on the floor and tied up the bank. I couldn't help it, and she snatched a kiss of him, and never think about it; and I went along up the smoothness and the judge for him, being a mystery, and he'd cipher out a speech, all full of it, and I would kill me, dey sk'yers me so. == Journey to the Center of the Earth via 'markovtext.rb': I gave way to the south-east. We have passed through the treatises of Cassanion, and all his time at Altona, staying with a translation? This IS the Icelandic Professor. At this rate we shall see. So says the Professor, no doubt, was pursuing his observations or taking notes, for in three days we must go back to the dull thuds of the country might be held up by the descent commenced. I can but try Spanish, French, Italian, Greek, or Hebrew. == Cheers, -albert
on 09.04.2006 18:51
Here is my solution. It was so much fun I couldn't help but keep making small tweaks. Here are a few of my favourite bits of output with varying word limits, from various sources (including rubytalk): ==== thou wast question'd my algorithm? -- posted delight hath. FasterCSV can also reason, having methods 2006 08:58 am, after lunch. April 1st joke. I'm subscribed to ruby-dev and the other hand, I've thought of it...hrrmmm... And now I fear, that the Mozilla Foundation relicensed everything as MPL and GNU GPL provides. It is unfortunate that it should be able to check against multiple types as well Hi. Ruby Quiz: 1. Please do not hesitate dive in to it, on the C:\ruby\lib \ruby\1.8\cgi\session.rb file, $350 convoluted I'm licenses David 11:26 install the 'ruby-db2-0.4' package and all that... I'll blow this police whistle from my shoulders. Suddenly and makes it incompatible. But I might have agreed tosuch without having to support meta-progarmming. Perhaps an example how to handle initialize_copy: The mixin manages some attributes and the other hand, I've thought of it...hrrmmm... ==== Before using it, you have to give it some knowledge - the -l option accepts a filename or URL to learn from (omit the filename for stdin). After chomping through it, the knowledge will get dumped out to a file ready to be used for generation. You can do this multiple times to incrementally teach it. If you want to use a large body of input I recommend storing each dump in it's own file, which you can do with the -f option. Then, specify one or more -f options when running generation to use the specified knowledge file(s). Once you've got some input, run markov.rb, specifying your -f options if you're not using the default 'chainstore' file, and optionally generation parameters such as -w N to limit to at most N words. There's a few options you can pass to control both learning and generation, there's a bit of doc available from the -h option. I posted a few chomped input files at: http://roscopeco.co.uk/code/ruby-quiz-entries/74/ http://roscopeco.co.uk/code/ruby-quiz-entries/74/README Thanks again for this quiz, I've gotten a fair few lol moments out of it :)
on 09.04.2006 19:19
Good quiz this one. Markov Chains are highly practical creations, we use them extensively when calulating life insurance and pensions schemes in Mercer. Wikipedia has a nice summary of other applications, I recommend it hihgly: http://en.wikipedia.org/wiki/Markov_chains#Scientific_applications At University, they taught us Markov Chains multiple times for various uses, but none of them used this linguistic approach. It's very intuitive, and so much more fun than the hard-core introductions we were given. So if I ever teach, I'll teach this quiz. Here we go: As primer for this writeup I've used Agatha Cristie's "The mysterious affair at Styles", a 57_000 words english novel starring Poirot, her famous detective. To establish the chain of words I use a Hash of Hashes, where the first is a simple lookup of words, and the second containt the words that folow. The second also counts how many times each word follows the leader. Example: Murder of Emily Agnes... Murder against Alfred Inglethorp ... Murder against some person ... Murder against him right ... @words["Murder"] >> {"of"=>1, "against"=>3} The counts are to weigh the randomness when selecting the next word. I wanted a selection process where rare followers in the real world are rear in this world as well. After 'Murder', 'of' will pop up every forth time. This is done here sum = candidates.inject(0) {|sum,kv| sum += kv[1]} random = rand(sum)+1 partial_sum = 0 next_word = candidates.find do |word, count| partial_sum += count partial_sum >= random end.first sum is the total number of followers (4 for 'Murder'). I pick a random number 0..3, add +1 for indexing and iterate my way up the list of followers, and keep track of how many words I have passed on my way. When we reach (or pass for the first time) the random limit, the word at hand is the word we want. Since the block finds the first ['word', count] pair that satifies our condition, (f.ex ['against', 3]), I use .first to get it. That's it. Now just build a sentence by starting with a word and move along it's path. My syntectic rules are very simple, a sentence is complete when it containts a period ".", thereby opening up for som silly grammar. But it doesn't really matter, grammar processing isn't the text of the day. The results are good enoughby far, and quite entertaining too. But alas, even Mr Poirot would be hard pressed to solve this case. Output # 1 Murder of sight. Mary Cavendish was vital to connect the charred paper. "A great risk, it was a book?" "Yes, often. I could certainly had already been tested!" I was before I asked. Output # 2 Murder against Alfred Inglethorp, I fancied that, mon ami, it is broken bell, and the matter, hoping that a note." Miss Howard had a day we came to warn him more, had kindly offered to make amends; which weighed with the Hall in the company. Miss Howard was that?" "She made it was the Hall--you'n a penknife, and went round Styles Court in some time I voiced the room?" "I was inclined to call for some friends one side. His words of horrors business to make a very fond of her--yes, I made a message to the slips of dismissal. === code === class MarkovChain def initialize(text) @words = Hash.new wordlist = text.split wordlist.each_with_index do |word, index| add(word, wordlist[index + 1]) if index <= wordlist.size - 2 end end def add(word, next_word) @words[word] = Hash.new(0) if !@words[word] @words[word][next_word] += 1 end def get(word) return "" if !@words[word] followers = @words[word] sum = followers.inject(0) {|sum,kv| sum += kv[1]} random = rand(sum)+1 partial_sum = 0 next_word = followers.find do |word, count| partial_sum += count partial_sum >= random end.first #puts "Selected #{next_word} from #{candidates.inspect}" next_word end end mc = MarkovChain.new(File.read("Agatha Christie - The Mysterious Affair at Styles.txt")) sentence = "" word = "Murder" until sentence.count(".") == 4 sentence << word << " " word = mc.get(word) end puts sentence << "\n\n" === code ===
on 09.04.2006 19:43
Great quiz. It will be interesting to see how others have solved this problem. Here is my submission. To use run the following: $ cat <some_text_file> | ./rand_text.rb Or you can give options $ cat <some_text_file> | ./rand_text.rb -o 2 -n 10 -o : the order. Which is the number of previous words to consider -n : the number of sentences to output I used an hash of arrays to keep track of the possible state transitions. The key is the current state, and the contents of the array is the possible next states. When generating the output I randomly select elements from this array. I always start with the first 'n' number of words in the original text, where 'n' is the order. There is a sample output where <some_text_file> is Moby Dick and using the default parameters of order = 2, and number of sentences = 10. == Call me Ishmael. Some years ago--never mind how long precisely --having little or no money in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every kind whatsoever. It is a damp, drizzly November in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the mummies of those creatures in their huge bake-houses the pyramids. No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook. I abandon the glory and distinction of such offices to those who like them. For my part, I abominate all honorable respectable toils, trials, and tribulations of every kind whatsoever. It is quite as much as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the royal mast-head. True, they rather order me about--however they may thump and punch me about, I have of driving off the spleen, and regulating the circulation ==
on 09.04.2006 21:49
Here's my solutions. I wrote this 2.5 times. The first version
worked, but it was slow as molasses. The second one is much much
faster. The 2.5th inherits from the second one and produces more
readable, complete sentences
first, slow version:
% cat markov.rb
class WordTable
def initialize(src, separator = /\W+/, word_shape = /\w+/)
@src = src
@separator = separator
@word_shape = word_shape
end
def build_table
@table = Hash.new { |h, k| h[k] = [] }
pos = 0
while line = @src.gets
line = line.chomp.downcase
words = line.split(@separator).select { |potential_word|
potential_word.match(@word_shape)
}
words.each do |word|
@table[word] << pos
pos += 1
end
end
self
end
def words
@table.keys
end
def positions_for(word)
@table[word]
end
def followers_for(word)
positions = positions_for(word)
followers_positions = positions.map { |i| i + 1 }
following_words = self.words.select { |key_word|
followers_positions.any? { |pos| positions_for
(key_word).include?(pos) }
}
following_words
end
end
class ChainBuilder
attr_accessor :chain_length
def initialize(initial_word, word_table, chain_length = 10)
@initial_word = initial_word
@word_table = word_table
@chain_length = chain_length
end
def distributed_followers(word)
distd_followers = []
followers = @word_table.followers_for(word)
positions_of_word = @word_table.positions_for(word)
followers.each do |follower|
follower_positions = @word_table.positions_for(follower)
count = follower_positions.select { |pos|
prec = pos - 1
positions_of_word.include?(prec)
}.length
distd_followers += [follower] * count
end
distd_followers
end
def randomized_next_word(word)
choices = distributed_followers(word)
choices[ rand(choices.length) ]
end
def chain
res_chain = [@initial_word]
(self.chain_length - 1).times do
res_chain << randomized_next_word(res_chain.last)
end
res_chain
end
end
if $0 == __FILE__
if ARGV[0].nil?
STDERR.puts "Please provide a corpus."
STDERR.puts "#$0 usage: #$0 <corpus file name> [chain length]
[initial word]"
exit 1
end
chain_len = (ARGV[1] || 10).to_i
wt = nil
File.open(ARGV[0]) do |file|
#wt = WordTable.new(file, //, /./) # try by characters
wt = WordTable.new(file)
wt.build_table
end
init_word = (ARGV[2] || wt.words[ rand( wt.words.length ) ] )
chain_builder = ChainBuilder.new(init_word, wt, chain_len)
chain = chain_builder.chain
puts chain.join(' ').capitalize
end
Second, quicker version:
% cat markov2.rb
class MarkovChainBuilder
class Entry < Struct.new(:word_index, :successors)
end
def initialize(src, separator = /\W+/, word_shape = /\w+/,
downcase = true)
@src = src
@separator = separator
@word_shape = word_shape
@downcase = downcase
build_tables
end
def build_tables
@word_list = []
@successor_lists = {}
last_word = nil
@src.each do |line|
line = line.downcase if @downcase
line = line.chomp
words_for_line = line.split(@separator).select{ |w| w.match
(@word_shape)}
words_for_line.each do |word|
unless @successor_lists.has_key? word
entry = @successor_lists[word] = Entry.new
@word_list << word
entry.word_index = @word_list.length - 1
entry.successors = []
end
unless last_word.nil?
@successor_lists[last_word].successors << @successor_lists
[word].word_index
end
last_word = word
end
end
end
def distributed_successors_for(word)
@successor_lists[word].successors.map { |index| @word_list[index] }
end
def randomized_next_for(word)
succs = distributed_successors_for(word)
succs[ rand(succs.length) ]
end
def markov_chain(initial_word, len = 10)
res = [initial_word]
(len - 1).times do
res << randomized_next_for(res.last)
end
res
end
def words
@word_list
end
private :build_tables
end
if $0 == __FILE__
if ARGV[0].nil?
STDERR.puts "Please provide a corpus."
STDERR.puts "#$0 usage: #$0 <corpus file name> [chain length]
[initial word]"
exit 1
end
chain_len = (ARGV[1] || 10).to_i
mc = nil
File.open(ARGV[0]) do |file|
mc = MarkovChainBuilder.new(file)
end
init_word = (ARGV[2] || mc.words[ rand( mc.words.length ) ] )
chain = mc.markov_chain(init_word, chain_len)
puts chain.join(' ').capitalize
end
And now the third version which is basically the second version, but
looks for sentences:
% cat smarter_markov.rb
require 'markov2'
class SmarterMarkovChainBuilder < MarkovChainBuilder
def initialize(src)
super(src, /\s+/, /\S+/, false)
end
def markov_chain(initial_word, len = 10)
initial_chain = super
index_of_last_word = nil
initial_chain.each_index do |idx|
if initial_chain[idx] =~ /\.|\?|!/
index_of_last_word = idx
break
end
end
if index_of_last_word
initial_chain[0..index_of_last_word]
else
initial_chain
end
end
end
if $0 == __FILE__
mc = nil
File.open(ARGV[0]) do |file|
mc = SmarterMarkovChainBuilder.new(file)
end
start_words = mc.words.select { |w| w =~ /^[A-Z]/ }
init_word = start_words[ rand(start_words.length) ]
puts mc.markov_chain(init_word, 200).join(' ').gsub(/"/,'')
end
on 09.04.2006 22:05
Hi all,
here is my solution. Its a rather stupid approach, but as many simple
and short solutions its better than i had expected.
It uses some gsubs to simplify the input text and surround each word
and punctuation by exactly one space character.
Scan is used to find occurrences of the last <order> words and
store the next words respectively. (even multiple times)
From this array one is choose randomly (which preserves the original
frequency of occurrence because the words are stored multiple times
in the array, increasing the chance of being picked)
Thats it. It works well for files of several MB and gets even faster
if the order is higher.
usage: quiz74.rb <order> [input files]
--------------------------------------------------------------------
before, line, order = ['.'], '', ARGV.shift.to_i
txt = ARGF.read.tr('"/*\\()[]', ' ').downcase
txt = txt.gsub(/[^'\w]/, ' \0 ').gsub(/\s+/, ' ')
500.times do
ary = txt.scan(/ #{before.join(' ')} (\S+)/).transpose.first
before << ary[rand(ary.size)]
before.shift if before.length > order
(puts line; line = '') if line.length > 50 && /\w/ =~ before.last
line << ( /\w/ =~ before.last && !line.empty? ? ' ' : '') <<
before.last
end
--------------------------------------------------------------------
cheers
Simon
on 10.04.2006 02:18
My approach isn't terribly complex, but it was fun to write. It requires that you have a plain text file "book.txt" in the directory of execution. I used http://www.gutenberg.org/dirs/etext91/alice30.txt for all of my tests and my favorite phrase that it came up with was: Beautiful soup beauootiful soooop soooop of evidence said the queen turning to the dormouse. How is works is loads each word into a hash stripping punctuation and capitalization associating each word with an array of common words that it is followed by. One word is picked by random to start us off then it picks randomly commonly chained words from there till it ends up with a word that was at the end of sentence in the original document. Kind Regards,
on 10.04.2006 02:36
I think someone needs to take all the explanations from this thread and run their markov chainer on that. :D aniel -- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck) [[My webhost uptime is ~ 92%.. if no answer pls call again later!]]
on 10.04.2006 04:37
Here is my first ruby quiz solution submission. Thanks for posting
this one, it provided a fun break from my school projects.
I tried to make the code as general as possible, with variable order
of both words and letters. It uses a hash of arrays for storage. I was
planning on unifying the MarkovWords and MarkovLetters classes into
one, but decided against it, mostly to keep them simpler. These
examples use an english translation of The Odyssey to derive the text.
Usage...
[brian@spica] [~/rubyquiz]
$ ruby markov.rb -h
Usage: ruby markov.rb [options] <filename>
-h, --help show this usage message
-o, --order set markov chain order
-s, --sentences set number of sentences to print
-w, --words set number of words to print
-l, --letters use letters as the basic unit
Print 5 sentences of order 2, using words as the base unit...
[brian@spica] [~/rubyquiz]
$ ruby markov.rb -s 5 -o 2 ../etext/dyssy10.txt
My men came out of your wits? If Apollo and the whole world, neither
rich nor poor, who is handsome and clean built, whereas I am so lost
in a beautiful golden ewer, and poured it over with a lie; 'Neptune,'
said I, 'of escaping Charybdis, and at the base of her maids brought
the heifer down with my tears during the darkness of death itself. A
poor unfortunate tramp has come to pass." And Penelope answered,
"Stranger, you must have gone off to bring the contest to an end."
Melanthius lit the fire for her, but she wishes to hear the enchanting
sweetness of their grey hair.
Print 1 sentence of order 1, using words as the base unit...
[brian@spica] [~/rubyquiz]
$ ruby markov.rb -s 1 -o 1 ../etext/dyssy10.txt
Proserpine had got into a mission to be scandalised at last of them,
and lay a hard task, no more fond of wind blew a man talked and will
leave their sport known that stalks about his wife, and the store for
he stretched out their own house, which I shall be willing to the
house to you, Telemachus.
Print 50 words of order 3, using letters as the base unit...
[brian@spica] [~/rubyquiz]
$ ruby markov.rb -w 50 -o 3 -l ../etext/dyssy10.txt
the of it into beautiful nose kill as did ther people the wood and him
all the meanthrountry stilltrese ffere peak the the the eleide oly
with blooked ent back the himself over s his hey glarge tood welcomind
where and mannot been spoke aloud valitterince one of his
Print 50 words of order 2, using letters as the base unit, with text
from The Aeneid (latin version)...
[brian@spica] [~/rubyquiz]
$ ruby markov.rb -w 50 -o 2 -l ../etext/anidl10.txt
quora desi aras ta acerummarmallenstos es mihinus vade imurbethinc
plia caum turnit que re et sum fuit ade re restaeteri invicaviam
ceuctitur hos aectalisubsillo parmeis suntereffunc meo que
clachilliaeque mul rut moropula sonitotuspiumque terrequebraherautras
nos ad la ciem atque part dubstra repononibus comet ad num undo cisque
retrangeneferat simas obla
[brian@spica] [~/rubyquiz]
$ cat markov.rb
#!/usr/bin/ruby
class MarkovWords
def initialize(filename, order)
@order = order
@words = Hash.new
@state = Array.new(@order)
previous = Array.new(@order)
File.foreach(filename) do |line|
line.split(/\s+/).each do |word|
unless previous.include?(nil)
p = previous.join(' ')
unless @words.has_key?(p)
@words[p] = Array.new
end
@words[p] << word
end
previous.shift
previous.push(word)
end
end
end
def print_words(n = 50)
word = next_word(true)
1.upto(n) do |i|
print word, i == n ? "\n" : " "
word = next_word()
end
print "\n"
end
# sentences start with a capital or quoted capital and end with
# punctuation or quoted punctuation
def print_sentences(n = 5)
sentences = 0
word = next_word(true)
while word !~ /^['"`]?[A-Z]/
word = next_word()
end
begin
print word
if word =~ /[?!.]['"`]?$/
sentences += 1
if sentences == n
print "\n"
else
print " "
end
else
print " "
end
word = next_word()
end until sentences == n
end
def next_word(restart = false)
if restart or @state.include?(nil)
key = @words.keys[rand(@words.length)]
@state = key.split(/\s+/)
end
key ||= @state.join(' ')
# restart if we hit a dead end, rare unless text is small
if @words[key].nil?
next_word(true)
else
word = @words[key][rand(@words[key].length)]
@state.shift
@state.push(word)
word
end
end
end
class MarkovLetters
def initialize(filename, order)
@order = order
@letters = Hash.new
@state = Array.new(@order)
previous = Array.new(@order)
File.foreach(filename) do |line|
line.strip!
line << ' ' unless line.length == 0
line.gsub!(/\s+/, ' ')
line.gsub!(/[^a-z ]/, '')
line.split(//).each do |letter|
unless previous.include?(nil)
p = previous.join('')
unless @letters.has_key?(p)
@letters[p] = Array.new
end
@letters[p] << letter
end
previous.shift
previous.push(letter)
end
end
end
# words begin after a space and end before a space
def print_words(n = 50)
letter = next_letter(true)
while letter != ' '
letter = next_letter()
end
letter = next_letter()
words = 0
while words < n
words += 1 if letter == ' '
print letter
letter = next_letter()
end
print "\n"
end
def next_letter(restart = false)
if restart or @state.include?(nil)
key = @letters.keys[rand(@letters.length)]
@state = key.split(//)
end
key ||= @state.join('')
# restart if we hit a dead end, rare unless text is small
if @letters[key].nil?
next_letter(true)
else
word = @letters[key][rand(@letters[key].length)]
@state.shift
@state.push(word)
word
end
end
end
if $0 == __FILE__
require 'getoptlong'
def usage()
$stderr.puts "Usage: ruby #{$0} [options] <filename>",
" -h, --help show this usage message",
" -o, --order set markov chain order",
" -s, --sentences set number of sentences to
print",
" -w, --words set number of words to print",
" -l, --letters use letters as the basic unit"
end
order = 2
sentences = 5
words = nil
letters = false
opts = GetoptLong.new(["--help", "-h", GetoptLong::NO_ARGUMENT],
["--order", "-o",
GetoptLong::REQUIRED_ARGUMENT],
["--sentences", "-s",
GetoptLong::REQUIRED_ARGUMENT],
["--words", "-w",
GetoptLong::REQUIRED_ARGUMENT],
["--letters", "-l", GetoptLong::NO_ARGUMENT])
opts.each do |opt, arg|
case opt
when "--help"
usage
exit 0
when "--order"
order = arg.to_i
when "--sentences"
sentences = arg.to_i
words = nil
when "--words"
words = arg.to_i
sentences = nil
when "--letters"
letters = true
words = 50 if words.nil?
end
end
if ARGV.length < 1
usage
exit 1
end
ARGV.each do |arg|
begin
if letters
m = MarkovLetters.new(arg, order)
m.print_words(words)
else
m = MarkovWords.new(arg, order)
m.print_words(words) unless words.nil?
m.print_sentences(sentences) unless sentences.nil?
end
rescue
$stderr.puts $!
end
end
end
on 10.04.2006 06:11
On Apr 9, 2006, at 5:35 PM, Daniel Baird wrote: > I think someone needs to take all the explanations from this thread > and run > their markov chainer on that. Or everything why's ever said. --Steve
on 10.04.2006 13:09
On Mon, 2006-04-10 at 13:08 +0900, Stephen Waits wrote: > On Apr 9, 2006, at 5:35 PM, Daniel Baird wrote: > > > I think someone needs to take all the explanations from this thread > > and run > > their markov chainer on that. > > Or everything why's ever said. I got a few interesting bits using just this year's redhanded posts. Thanks to the small input body and a low variance setting there's almost as much between the lines as in the original posts :) ===== Ten-Sided is an alias for html. xhtml_strict does the same, but with root beer and cream soda lip balms instead of Roofies or Vitamin K. Actually, red.trust_sphere.each patches up batsmanâ??s user by then reflect ACTIVERECORD. Service Overrides Adding stuff like sessioning and authentication demands hooks on the web. Skeptical? Iâ??d say you invented tumblelogging way back in the above equation start out equal. Try seeding the rankings. Maybe you want trust from Matz to be full projects. And, of course, he really can be seen as a not entirely bad reason for actually providing a time of 1.815 seconds, compared with St. Valentineâ??s YARV which will format the date according to a classâ?? page. I think Marcel has fixed this in Edge Rails. We get on these little kicks. As weâ??re all snooping around. Little protocols or obscure version control systems. Or domino games or something. Trust metric, man. It just strikes me all the time. We have, right now, an over-night offsite meeting at a Chupei hackathon to get Perlâ??s YAML::Syck module on its legs and with a gem, but you never know. Radicals often portray future animal executions in the above code and also a follow-up with darcs and switchtower. Install FuseFS. Run rubyfs.rb. Weâ??ve been through this day before we talk about this at CampingSessions on the wiki. Good thing gabriele is out there, because Iâ??ve been playing with a timezone setting, right? These are tough issues, with a rewrite of miniature file-sharing and I really appreciate him. The textarea technique is very hard to get even with net auctions. Donâ??t get me wrong. It does not mean the book is bad. =====
on 10.04.2006 14:11
Hi, this is the first time i participate in a ruby-quiz. It was quite funny programming this, and testing it out. I know that my solution is far from perfect, but since I invested some time in it (I'm quite new to ruby, so had to look up a lot of stuff) I'm submitting it. It does nothing fancy, and it misses some error-checking. The main data structure is a Hash, it's explained in the code. If I did something completely stupid it would be nice if someone tells me :-) Thanks, Tom
on 10.04.2006 16:08
On Apr 9, 2006, at 8:45 AM, Albert Vernon Smith wrote: > Coming from Perl, I often rely upon auto-vivification, so I needed > to figure out how to work around this. Perhaps there are improved > ways of going about this, and I'd appreciate any feedback on how to > go about it better. Your code looks very clean to me! A minor thing I noticed is that you could simplify: .gsub(/[\"\(\)]/,"") to: .delete('"()') Can you show an example using Perl's auto-vivification that feels funny in Ruby? Perhaps we would have better ideas after seeing it... Thanks for the submission! James Edward Gray II
on 10.04.2006 16:21
On Apr 10, 2006, at 7:08 AM, Tom Rauchenwald wrote: > If I did something completely stupid it would be nice if someone > tells me :-) Nothing stupid no. Here are a few extremely minor suggestions. File.open(filename) do |io| io.each do |line| ... end end That pattern can be simplified to: File.foreach(filename) do |line| ... end And... if @frequencies[tupel].include?(word) @frequencies[tupel][word]+=1 else @frequencies[tupel][word]=1 end If you set the Hash to default to a value of 0, you could simplify the above to just: @frequencies[tupel][word]+=1 All in all, you're code is quite nice. Thanks for the submission. James Edward Gray II
on 10.04.2006 19:16
Here is my solution to this very interesting quiz.
My MarkovChainer can work with any order and it is sentence oriented: I
split the input text with /([.!?])/ and then scan each sentence with
/[\w']+/, so all punctuation (except ".", "!" and "?") is ignored.
It also stores the first <order> words of each sentence, to use them as
start words when generating sentences.
The script accepts command line arguments for the order (-o) and the
number of sentences to generate (-n), the defaults are -o2 -n10. Then it
uses ARGF.read as input text and prints n generated sentences.
Now lets talk a bit about the data structure:
My first implementation used trees of hashes and because symbols are
faster as hash keys than strings, I stored the words as symbols:
For the text "q s a w s. a s w q s. q w s a a w. s q a w s.", that
looked
like:
{:q=>{:s=>{:"."=>1, :a=>1}, :a=>{:w=>1}, :w=>{:s=>1}},
:s=>{:q=>{:a=>1}, :w=>{:q=>1}, :a=>{:a=>1, :w=>1}},
:w=>{:q=>{:s=>1}, :s=>{:"."=>2, :a=>1}},
:a=>{:s=>{:w=>1}, :a=>{:w=>1}, :w=>{:"."=>1, :s=>2}}}
Then I thought it might be faster to put all the last words into one
array
instead of building a frequency hash, because I built that array anyway,
when I was looking for the next word: That looked like:
{:q=>{:s=>[:a, :"."], :a=>[:w], :w=>[:s]},
:s=>{:q=>[:a], :a=>[:w, :a], :w=>[:q]},
:a=>{:s=>[:w], :a=>[:w], :w=>[:s, :".", :s]},
:w=>{:q=>[:s], :s=>[:".", :a, :"."]}}
And it also was faster. Then I wanted to know if symbols really are
faster
then strings:
{"w"=>{"q"=>["s"], "s"=>[".", "a", "."]},
"a"=>{"a"=>["w"], "w"=>["s", ".", "s"], "s"=>["w"]},
"q"=>{"a"=>["w"], "w"=>["s"], "s"=>["a", "."]},
"s"=>{"w"=>["q"], "a"=>["w", "a"], "q"=>["a"]}}
It turned out that strings are faster. I think that is because
converting
a symbol to a string (when generating the sentences) needs one hash
lookup
and generates a new String object. So the faster hash lookup of symbols
doesn't pay off.
And finally I wanted to see if at least the hash tree is worth it. I
tried
the following:
{["q", "a"]=>["w"],
["q", "w"]=>["s"],
["s", "q"]=>["a"],
["w", "q"]=>["s"],
["s", "w"]=>["q"],
["a", "s"]=>["w"],
["w", "s"]=>[".", "a", "."],
["s", "a"]=>["w", "a"],
["q", "s"]=>["a", "."],
["a", "a"]=>["w"],
["a", "w"]=>["s", ".", "s"]}
Well, the hash tree also turned out to be slower than this simple flat
hash (multiple hash lookups are slower than computing the hash of a
multi
element array and one lookup).
(I also tried joined strings as keys (e.g. "q a"=>["w"]), but they are
slower than the array keys, because more transformations are needed)
So, the morale of all this:
- don't use symbols if they have to be converted to a string often
- hash lookups might be slower than you think
- premature optimization...
Dominik
class MarkovChainer
attr_reader :order
def initialize(order)
@order = order
@beginnings = []
@freq = {}
end
def add_text(text)
# make sure each paragraph ends with some sentence terminator
text.gsub!(/\n\s*\n/m, ".")
text << "."
seps = /([.!?])/
sentence = ""
text.split(seps).each { |p|
if seps =~ p
add_sentence(sentence, p)
sentence = ""
else
sentence = p
end
}
end
def generate_sentence
res = @beginnings[rand(@beginnings.size)]
loop {
unless nw = next_word_for(res[-order, order])
return res[0..-2].join(" ") + res.last
end
res << nw
}
end
private
def add_sentence(str, terminator)
words = str.scan(/[\w']+/)
return unless words.size > order # ignore short sentences
words << terminator
buf = []
words.each { |w|
buf << w
if buf.size == order + 1
(@freq[buf[0..-2]] ||= []) << buf[-1]
buf.shift
end
}
@beginnings << words[0, order]
end
def next_word_for(words)
arr = @freq[words]
arr && arr[rand(arr.size)]
end
end
if $0 == __FILE__
order = 2
n = 10
while ARGV[0] =~ /\A-([on])([1-9]\d*)\z/
if $1 == "o"
order = Integer($2)
else
n = Integer($2)
end
ARGV.shift
end
mc = MarkovChainer.new(order)
mc.add_text(ARGF.read)
n.times {
puts mc.generate_sentence
}
end
on 10.04.2006 20:54
class WordHash
attr_accessor :words
def initialize(words)
@words = {}
words.each_with_index{|x,i|self.add(x,words[i-1],words[i+1])}
end
def add(word,prev,nex)
@words[word] ||= {}
@words[word]['prev'] ||= Hash.new{|k,v|k[v]=0}
@words[word]['prev'][prev] += 1
@words[word]['next'] ||= Hash.new{|k,v|k[v]=0}
@words[word]['next'][nex] += 1
end
end
f = ARGF.read.join.tr('"/*\\()[]\'', ' ').downcase
words = f.gsub(/[^'\w]/, ' \0 ').gsub(/\s+/, ' ').split(/\W/)
words = words.map{|x|x.strip}
w = WordHash.new(words)
@k = (a=w.words).keys[rand(a.size)]
wi = [@k.capitalize]
100.times do
c=(a=w.words[@k]['next']).keys[rand(a.keys.size)]
if c != ' '
wi << c
@k = w.words.keys.find{|y|y==c}||rand(a.keys.size)
else
next
end
end
puts wi.join(" ").gsub(/\s+/,' ')+'.'
on 10.04.2006 21:19
On 10.4.2006, at 14:05, James Edward Gray II wrote: > On Apr 9, 2006, at 8:45 AM, Albert Vernon Smith wrote: > >> Coming from Perl, I often rely upon auto-vivification, so I needed >> to figure out how to work around this. Perhaps there are improved >> ways of going about this, and I'd appreciate any feedback on how >> to go about it better. > > > Can you show an example using Perl's auto-vivification that feels > funny in Ruby? Perhaps we would have better ideas after seeing it... Didn't feel funny, I just wasn't use to having to declare objects (hashes and arrays) as I added them as branches on my initial hash. I had to use a little more discipline, which probably is a good thing coming from Perl-land. -a
on 10.04.2006 21:22
On Tue, 2006-04-11 at 02:15 +0900, Dominik Bathon wrote: > So, the morale of all this: > - don't use symbols if they have to be converted to a string often > - hash lookups might be slower than you think > - premature optimization... I used symbols in my solution, but primarily because I was trying to make the hash itself require as little memory as possible to allow for vast bodies of input (which I admit I've not really tried yet) and I figured a hash on symbols would be smaller than a hash on strings. I guess this means that GC is pushed by lots of string objects created and destroyed during the generation run, but I tended to aim for memory efficiency over speed for this one, as long as it was running 'fast enough'.
on 10.04.2006 21:47
Hi, From: "Albert Vernon Smith" <smithav@cshl.edu> > > On 10.4.2006, at 14:05, James Edward Gray II wrote: > >> Can you show an example using Perl's auto-vivification that feels >> funny in Ruby? Perhaps we would have better ideas after seeing it... > > Didn't feel funny, I just wasn't use to having to declare objects > (hashes and arrays) as I added them as branches on my initial hash. > I had to use a little more discipline, which probably is a good thing > coming from Perl-land. Incidentally, if you ever need an auto-vivifying hash-of-hashes (as opposed to mixture of hashes and arrays that is possible because of Perl syntax), you can do the hash-of-hashes auto-vivify in Ruby: HashFactory = lambda { Hash.new {|h,k| h[k] = HashFactory.call} } irb(main):181:0> x = HashFactory.call => {} irb(main):182:0> x['abc']['def']['ghi'] = 123 => 123 irb(main):183:0> x => {"abc"=>{"def"=>{"ghi"=>123}}} Regards, Bill
on 11.04.2006 00:02
On Mon, 10 Apr 2006 21:20:07 +0200, Ross Bamford <rossrt@roscopeco.co.uk> wrote: > figured a hash on symbols would be smaller than a hash on strings. Tha hash itself will have the same size, because it only stores VALUEs (which are just longs). But each of the string VALUEs will "point" to another "object" on the heap, while the symbol VALUEs don't have an extra "object" attached. > I guess this means that GC is pushed by lots of string objects created > and destroyed during the generation run, but I tended to aim for memory > efficiency over speed for this one, as long as it was running 'fast > enough'. Yes, memory efficiency was another reason for me to first try the "hash tree of symbols". But on the other side: the string objects are created anyway (before they are converted to symbols). They can be collected after conversion to symbol, but when you generate the sentence you are again creating many new string objects (Symbol#to_str generates a new string object every time, ruby stores only c-strings for the symbols), which wouldn't be necessary, if you had kept the strings in the first place. So, yes, symbols are more memory efficient for storing big frequency hashes, but they are slower for generating, so it's a tradeoff. By the way here are some numbers: order first final 2 7.380s 1.973s 4 6.279s 2.002s 6 8.031s 1.972s Those runs are for a 700K text file and 1000 sentences generated. I didn't measure the memory. Dominik