Markov Chains (#74)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

This week’s Ruby Q. is about text generation. That’s right, we’re
going to
teach your computer to weave tall tales.

At its most basic level, a solution might be:

>> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join
=> "fb mcr hhluesjbhtf swm eehokmi"

However, let’s make our goal to get as close to English looking
sentences as
possible. One way you might do this is using a technique called Markov
Chains.

To use Markov Chains, you read some text document(s), making note of
which
characters commonly follow which characters or which words commonly
follow other
words (it works for either scale). Then, when generating text, you just
select
a character or word to output, based on the characters or words that
came before
it.

The number of previous characters or words considered is called the
“order” and
you can adjust that to try and find a natural feel. For example, here
is some
generated text using a second order word chain derived from the Sherlock
Holmes
novel “The Hound of the Baskervilles” by Arthur Conan Doyle:

The stars shone cold and bright, while a crushing weight of 

responsibility
from my shoulders. Suddenly my thoughts with sadness. Then on the
lady’s
face. “What can I assist you?”

If you need text’s to prime your program with, I suggest searching
Project
Gutenberg:

http://www.gutenberg.org/

I’m pumped! I’ve been reading this mailing list for the last 3 months
and I finally feel ready to try out a quiz. This one sounds fun!

On Apr 7, 2006, at 8:53 AM, Charlie B. wrote:

I’m pumped! I’ve been reading this mailing list for the last 3 months
and I finally feel ready to try out a quiz. This one sounds fun!

I’m glad.

These can be quite a bit of fun, depending on what text you prime
them with… :wink:

James Edward G. II

or ascii based tab files for us banjo pickers!

On Fri, Apr 07, 2006 at 10:02:28PM +0900, Ruby Q. wrote:

possible. One way you might do this is using a technique called Markov Chains.
I think it would be really neat to apply this to music composition. I’m
just not sure how you would get a corpus for that (maybe parsing
lilypond files?).

–Aaron

On 4/7/06 10:14 AM, “Charlie B.” [email protected] wrote:

or ascii based tab files for us banjo pickers!

Banjo? I’m pretty sure he said “music”, so I’m not sure why you are
bringing
up banjos. :slight_smile:

What’s the difference between a banjo and a(n)?

Chain Saw:

  1. a chain saw has a dynamic range.

  2. you can turn a chain saw off.

  3. South American Macaw: one is loud, obnoxious, and noisy; and the
    other
    is a bird.

  4. Harley Davidson Motorcycle: you can tune a Harley.

  5. Onion: no one cries when you cut up a banjo.

  6. Trampoline: you take your shoes off to jump on a trampoline.

  7. Uzi: an uzi only repeats forty times.

Sorry - couldn’t help myself on this one :slight_smile:

Keith

I would like Ruby support of serial ports for the following
three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Can you differentiate between/clarify #1 and #2?

-M

I would like Ruby support of serial ports
for the following three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Please respond offline if you can
show me how to add any or all of these

Gus

[email protected]

Banjo players spend half their lives tuning and the other half playing
out of tune.

Banjos are definitely an acquired taste. I think I’ve figured out why I
love them so much. They are as close instrument to a digital signal as
you can get (other than a drum). The sound is so staccato and you pluck
so many notes in a second that it is very near to being digital :slight_smile:

On 4/7/06, James Edward G. II [email protected] wrote:
[snip]

These can be quite a bit of fun, depending on what text you prime
them with… :wink:

input_text := ruby_talk; (* hehe *)

Ruby Q. wrote:

The stars shone cold and bright, while a crushing weight of responsibility
from my shoulders. Suddenly my thoughts with sadness. Then on the lady’s
face. “What can I assist you?”

I can’t tell if this is a bug, or success beyond my wildest dreams:

She wore handily the woven craft of lips.  Besotted!  My anglican parrot
wenches freely.  Three dozen Iceland begin to unwind.  Hope.  Charity.
My outrage cannot succumb to rainswept chance.  Now chance he the
broadly winding spar.

This stock is on the move!  You'll never fail to satisfy her again!
Home loans from 0.025%!

V t A G R A from $ 3.50 per item
C 1 @ L I S from $3.75 per item
V Ax L 1 U M from $1.20 per item

Premier PARMACY - your   c e r t i f i e d   online parmacy

I took usenet for my corpus, then used a genetic algorithm with
SpamAssassin for scoring.

And, uh, I wrote it 6 days ago… :wink:

-dB

On Apr 7, 2006, at 10:18 AM, James Edward G. II wrote:

James Edward G. II
Indeed, I just finished my solution (first Ruby Q., I’ve done, yay)
and it’s all sorts of fun

On 4/7/06, Simon S. [email protected] wrote:

On 4/7/06, James Edward G. II [email protected] wrote:
[snip]

These can be quite a bit of fun, depending on what text you prime
them with… :wink:

input_text := ruby_talk; (* hehe *)

I’ve been considering collecting all the Ilias posts (filtering out
other people’s comments that are quoted, for purity) and seeing what
it prints out. :wink:

Jacob F.

On Apr 7, 2006, at 7:02 PM, Ross B. wrote:

days,
in summary, to get wrote: pointers, or care what you mean.
anobjective-c frontend set of programming
<rubytalk2text.rb>
Thanks for the script, I note this interesting result with my
implementation:
% ruby markov.rb rt.text 12
Alternate austin austin austin austin halostatue gmail com alternate
austin austin austin

On Sat, 2006-04-08 at 03:07 +0900, Simon S. wrote:

On 4/7/06, James Edward G. II [email protected] wrote:
[snip]

These can be quite a bit of fun, depending on what text you prime
them with… :wink:

input_text := ruby_talk; (* hehe *)

I had to try this out :slight_smile: With 500 messages from the past couple of days,
order of 3, I found lots of cryptic wisdom when using a small word
limit, such as:

===
windows :\ i guess clarity, just isn’t a interesting.

thanks for making def say_stuff puts

bereflected to the foo to design a gui by open source path.

in summary, to get wrote: pointers, or care what you mean.

example, c# guys (including even worse - perl.

warning: pointer targets in boese wrote: use an i’ll go digging

+0900, james edward gray would help with reading v_mod/1 m = ‘m’

to figure this out? tashiro wrote: you need own license, possibly

april fool’s joke. v.- interface and doscommand which next implemented
anobjective-c frontend set of programming

I’ve attached the quick and dirty script I used to get cleaned-up input
text from ruby-talk archive.

This is a seriously cool quiz :slight_smile: Thanks all concerned.

I went and grabbed a bunch of Grimm’s fairy tales from Gutenberg. I
still need to tweak my algorithm some; the most annoying thing I find
is punctuation, especially quoted strings (i.e. speech) so that the
output text looks like there is some valid speech in there and not
just randomly scattered quote marks.

Anyway, here’s a sample I produced with my first attempt:

There was a man who kills seven at one blow? I leapt over the tree
because the huntsmen are shooting down there in the closet on the
porch.’ The miller said: ‘The Devil must go out,’ and opened the door
and opened it. As she went in, a little dwarf no bigger than my
finger. And before her stood princes, and dukes, and earls: and the
fisherman went up to her palace all of shining gold; and told her
mother that she must be the handsomest lady in the land; and she went
to the fire and growled contentedly and comfortably. It was not long
in saying ‘Yes’ to all this; and as they were entering the village,
the son followed the fox’s counsel, and without looking about him went
to the other anvil. The old man led him back into the water. The girls
came just in time; they held him fast and tried to free his beard from
the line, but all in vain; he heard or saw nothing of Jorinda. At last
he thought to himself: ‘How the old woman is snoring! I must just see
if she wants anything.’

and:

An aged count once lived in Switzerland, who had an only son, but he
was now grown too old to work; so the farmer would give him his only
daughter to her bed-side, and said, 'Always be a good girl, and I will
never by myself leave the path, to run into the wood, till at last
they danced out at the gate together. When they had walked a short
time, when they had warmed themselves, they said: 'Comrade, shall we
have a child, and he grows big, and we send him into the wide world,
and looked neither to the right place. There they found the giants
swimming in their blood, and all round about her, and the bells rang
at each step which she took. Then she was in a great hurry. The little
tailor looked round and saw a flock of sheep, the very shepherd whom
the peasant knew had long been wishing to be mayor, so he cried with
all his might.

Correction to earlier post AGS Calabrese
00000000000000000000000000000000000000000000000000
I would like Ruby support of serial ports
for the following three platforms.
#1 Windows
#2 Linux
#3 Mac OS X Tiger

Please respond offline if you can
show me how to add serial support to any or all of these

Gus

[email protected]

my spam filter has been eating messages so I may miss your message.
I am working on the problem
000000000000000000000000000000000000000000000000000

On 2006-Apr 07, at 10:46 AM, Mike wrote:

I would like Ruby support of serial ports for the following
three platforms.
#1 Windows
#2 XP
#3 Mac OS X

Can you differentiate between/clarify #1 and #2?

-M

Here is my solution. It was so much fun I couldn’t help but keep making
small tweaks. Here are a few of my favourite bits of output with varying
word limits, from various sources (including rubytalk):

====
thou wast question’d my algorithm? – posted delight hath.

FasterCSV can also reason, having methods 2006 08:58 am, after lunch.

April 1st joke. I’m subscribed to ruby-dev and the other hand, I’ve
thought of it…hrrmmm… And now I fear, that the Mozilla Foundation
relicensed everything as MPL and GNU GPL provides. It is unfortunate
that it should be able to check against multiple types as well Hi.

Ruby Q.: 1. Please do not hesitate dive in to it, on the C:\ruby\lib
\ruby\1.8\cgi\session.rb file, $350 convoluted I’m licenses David 11:26
install the ‘ruby-db2-0.4’ package and all that…

I’ll blow this police whistle from my shoulders. Suddenly and makes it
incompatible. But I might have agreed tosuch without having to support
meta-progarmming. Perhaps an example how to handle initialize_copy: The
mixin manages some attributes and the other hand, I’ve thought of
it…hrrmmm…

Before using it, you have to give it some knowledge - the -l option
accepts a filename or URL to learn from (omit the filename for stdin).
After chomping through it, the knowledge will get dumped out to a file
ready to be used for generation. You can do this multiple times to
incrementally teach it.

If you want to use a large body of input I recommend storing each dump
in it’s own file, which you can do with the -f option. Then, specify one
or more -f options when running generation to use the specified
knowledge file(s).

Once you’ve got some input, run markov.rb, specifying your -f options if
you’re not using the default ‘chainstore’ file, and optionally
generation parameters such as -w N to limit to at most N words.

There’s a few options you can pass to control both learning and
generation, there’s a bit of doc available from the -h option.

I posted a few chomped input files at:
http://roscopeco.co.uk/code/ruby-quiz-entries/74/
http://roscopeco.co.uk/code/ruby-quiz-entries/74/README

Thanks again for this quiz, I’ve gotten a fair few lol moments out of
it :slight_smile:

Hi all-

Here is my first attempt at a Ruby Q. (attached file:
‘markovtext.rb’). I’m new to Ruby, mostly paying the bills over the
past years with Perl work. This quiz was fun, and I learned a lot
from doing this. Coming from Perl, I often rely upon auto-
vivification, so I needed to figure out how to work around this.
Perhaps there are improved ways of going about this, and I’d
appreciate any feedback on how to go about it better.

My solution is fairly straight forward, using a second order word
chain, which I imagine many others will be implementing in similar
ways. As noted in the comments, I only start making my paragraph on
the second sentence, in order to get greater diversity in how the
paragraph starts. I’m also striping out some of the punctuation from
the end result, as some texts give orphaned punctuation marks.

I had the most fun from the quiz running my script against various
texts, and seeing the results. My favorite has been getting results
from “Huckleberry Finn”, though the grammar gets a bit off at times,
but that relates to the input text. For comparison, one can see much
more typical sentence structure from “Journey to the Center of the
Earth”.

Example results:

==

Huck Finn via ‘markovtext.rb’:

And above all, don’t you try to run in the dark he says: Yes, Mars
Sid, A dog. Cur’us dog, too. Does you want it, you can go and write
poetry after them out loud. One bill said, The celebrated Dr. Armand
de Montalban, of Paris, would lecture on the floor and tied up the
bank. I couldn’t help it, and she snatched a kiss of him, and never
think about it; and I went along up the smoothness and the judge for
him, being a mystery, and he’d cipher out a speech, all full of it,
and I would kill me, dey sk’yers me so.

==

Journey to the Center of the Earth via ‘markovtext.rb’:

I gave way to the south-east. We have passed through the treatises of
Cassanion, and all his time at Altona, staying with a translation?
This IS the Icelandic Professor. At this rate we shall see. So says
the Professor, no doubt, was pursuing his observations or taking
notes, for in three days we must go back to the dull thuds of the
country might be held up by the descent commenced. I can but try
Spanish, French, Italian, Greek, or Hebrew.

==

Cheers,
-albert

Good quiz this one.

Markov Chains are highly practical creations, we use them extensively
when
calulating life insurance and pensions schemes in Mercer.

Wikipedia has a nice summary of other applications, I recommend it
hihgly:

At University, they taught us Markov Chains multiple times for various
uses, but none of them used this linguistic approach. It’s very
intuitive,
and so much more fun than the hard-core introductions we were given. So
if
I ever teach, I’ll teach this quiz.

Here we go:

As primer for this writeup I’ve used Agatha Cristie’s “The mysterious
affair at Styles”, a 57_000 words english novel starring Poirot, her
famous detective.

To establish the chain of words I use a Hash of Hashes, where the first
is
a simple lookup of words, and the second containt the words that folow.
The second also counts how many times each word follows the leader.

Example:

Murder of Emily Agnes…
Murder against Alfred Inglethorp …
Murder against some person …
Murder against him right …

@words[“Murder”] >> {“of”=>1, “against”=>3}

The counts are to weigh the randomness when selecting the next word. I
wanted a selection process where rare followers in the real world are
rear
in this world as well. After ‘Murder’, ‘of’ will pop up every forth
time.
This is done here

sum = candidates.inject(0) {|sum,kv| sum += kv[1]}
random = rand(sum)+1
partial_sum = 0
next_word = candidates.find do |word, count|
  partial_sum += count
  partial_sum >= random
end.first

sum is the total number of followers (4 for ‘Murder’). I pick a random
number 0…3, add +1 for indexing and iterate my way up the list of
followers, and keep track of how many words I have passed on my way.
When
we reach (or pass for the first time) the random limit, the word at hand
is the word we want. Since the block finds the first [‘word’, count]
pair
that satifies our condition, (f.ex [‘against’, 3]), I use .first to get
it.

That’s it. Now just build a sentence by starting with a word and move
along it’s path. My syntectic rules are very simple, a sentence is
complete when it containts a period “.”, thereby opening up for som
silly
grammar. But it doesn’t really matter, grammar processing isn’t the text
of the day.

The results are good enoughby far, and quite entertaining too. But alas,
even Mr Poirot would be hard pressed to solve this case.

Output # 1

Murder of sight. Mary Cavendish was vital to connect the charred paper.
“A
great risk, it was a book?” “Yes, often. I could certainly had already
been tested!” I was before I asked.

Output # 2

Murder against Alfred Inglethorp, I fancied that, mon ami, it is broken
bell, and the matter, hoping that a note." Miss Howard had a day we came
to warn him more, had kindly offered to make amends; which weighed with
the Hall in the company. Miss Howard was that?" “She made it was the
Hall–you’n a penknife, and went round Styles Court in some time I
voiced
the room?” "I was inclined to call for some friends one side. His words
of
horrors business to make a very fond of her–yes, I made a message to
the
slips of dismissal.

=== code ===

class MarkovChain
def initialize(text)
@words = Hash.new
wordlist = text.split
wordlist.each_with_index do |word, index|
add(word, wordlist[index + 1]) if index <= wordlist.size - 2
end
end

def add(word, next_word)
@words[word] = Hash.new(0) if !@words[word]
@words[word][next_word] += 1
end

def get(word)
return “” if !@words[word]
followers = @words[word]
sum = followers.inject(0) {|sum,kv| sum += kv[1]}
random = rand(sum)+1
partial_sum = 0
next_word = followers.find do |word, count|
partial_sum += count
partial_sum >= random
end.first
#puts “Selected #{next_word} from #{candidates.inspect}”
next_word
end
end

mc = MarkovChain.new(File.read(“Agatha Christie - The Mysterious Affair
at
Styles.txt”))

sentence = “”
word = “Murder”
until sentence.count(“.”) == 4
sentence << word << " "
word = mc.get(word)
end
puts sentence << “\n\n”

=== code ===