[SOLUTION] Markov Chains (#74)


#1

Hi all,

Here’s my solution for the quiz No 74. It generates text with a first
order Markov Chain. Because the matrix of a Markov chain for native
language tests is sparse, the chain is stored in a hash of hashes for
better memory usage.

I attached a tar archive for all the files and the main classes for easy
reading.

There are some open issues i think, but i did not had the time to
implement
them

  • for usage of higher order markov chains one should add a list of
    previous elements in the method MarkovChain#add_elems(). But this breaks
    other parts of the code and these parts have to be adapted to this.
  • better performance of the MarkovChain#rand() method
  • and of course there are a lot of open issues with respect to natural
    language processing, the code does not produce grammatical valid
    sentences, no quotes, no comma, etc.

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the
Programming Ruby" book by Dave T. et. al. and started to read it
two weeks and two days ago and i used it a lot the last 48 hours. I
think there are some parts of the code where there are better solutions
or could be better expressed in a more “rubyish” way.

I used Java and Perl a lot in the last years and i will use Ruby now
instead of Perl. I can remember my experiences with hashes of hashes in
Perl. This was awkward and often a pain. And on friday as i started
coding for this quiz i suddenly realized that i had none of this issues
in Ruby. Wow ! Because i know the functional language Haskell i am used
to iterators, map’s, lambda’s etc. and i tried to use iterators and
block in the program, but i think i have to get more experience here in
Ruby.

If you have any comments, i will be glad to receive and answer them.

Extract the archive

$ tar xzf joern_dinkla_quiz_74.tar.gz

Call the program and print the command line syntax.

$ cd net.dinkla.ruby.quiz.74/lib
$ ruby main-markov-chain.rb --help

Best regards,

Joern D.


#2

On 4/9/06, Joern D. removed_email_address@domain.invalid wrote:
[snip]

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the
[snip]

You are doing good.

sometimes you use parentesis and other times you don’t.
def test() is the same as def test
if(cond) is the same as if cond
.length() is the same as .length

File.open(filename) do | file |
file.each_line() do |line|

is the same as

IO.readlines(filename).each do |line|

def initialize(mc = nil)
if mc.nil?
@mc = MarkovChain.new()
else
@mc = mc
end
end

is the same as

def initialize(mc = nil)
@mc = mc || MarkovChain.new
end


#3

“Simon S.” removed_email_address@domain.invalid writes:

IO.readlines(filename).each do |line|

 IO.foreach(filename) do |line|

#4

On Apr 9, 2006, at 2:41 PM, Simon S. wrote:

sometimes you use parentesis and other times you don’t.

IO.readlines(filename).each do |line|

Let’s use foreach() there, so we don’t slurp a file only to process
it line by line:

File.foreach(filename) do |line| … end

def initialize(mc = nil)
@mc = mc || MarkovChain.new
end

Why not just set the default correctly?

def initialize( mc = MarkovChain.new )
@mc = mc
end

James Edward G. II


#5

On 4/9/06, Christian N. removed_email_address@domain.invalid wrote:

“Simon S.” removed_email_address@domain.invalid writes:

IO.readlines(filename).each do |line|

 IO.foreach(filename) do |line|

aha, thanks


#6

On Apr 9, 2006, at 8:59 AM, Joern D. wrote:

I programmed it to learn Ruby and to improve my coding skills,
because i have less than 2,5 weeks of experience with Ruby. I
bought the Programming Ruby" book by Dave T. et. al. and
started to read it two weeks and two days ago and i used it a lot
the last 48 hours. I think there are some parts of the code where
there are better solutions or could be better expressed in a more
“rubyish” way.

For debugging statements like:

if ( $VERBOSE )
puts “Calculating probabilities …”
STDOUT.flush()
end

I like to use the $DEBUG variable. You can toggle it on and off with
Ruby’s -d command-line switch.

The call to flush is also not needed:

puts “Calculating probabilities …” if $DEBUG

Hope that helps.

James Edward G. II


#7

Hi Simon,

Simon S. wrote:

On 4/9/06, Joern D. removed_email_address@domain.invalid wrote:
[snip]

I programmed it to learn Ruby and to improve my coding skills, because i
have less than 2,5 weeks of experience with Ruby. I bought the

[snip]

You are doing good.

Thank you very much.

sometimes you use parentesis and other times you don’t.
def test() is the same as def test
if(cond) is the same as if cond
.length() is the same as .length

Yes, sometimes i forgot them. For me the code is easier to read when i
add parenthesis to functions (in definitions and calls). I think this is
because i am still a newbie. But if i do not put the parenthesis at the
end i do not know if its an accessor to an attribute or a function.

But for me there is the exception if a block follows a function call.

list.each() do |x|
...
end

simply does not look as good as

list.each do |x|
...
end

File.open(filename) do | file |
file.each_line() do |line|

is the same as

IO.readlines(filename).each do |line|

I will use the File.foreach method as suggested by Christian N…

def initialize(mc = nil)
@mc = mc || MarkovChain.new
end

Yes, that looks better.


Simon S.

Best regards,

Joern


#8

James Edward G. II wrote:

For debugging statements like:

if ( $VERBOSE )
puts “Calculating probabilities …”
STDOUT.flush()
end

I like to use the $DEBUG variable. You can toggle it on and off with
Ruby’s -d command-line switch.

Thanks for the hint. But i do want a difference between debug output and
verbose output. Debug output is for the developer and the verbose output
for developers and users, so that the user can observe what is going on.

I did not know about the -d switch. Thanks.

The call to flush is also not needed:

puts “Calculating probabilities …” if $DEBUG

It’s not strictly needed, yes, but when i did the tests i wanted to get
the output as fast as possible, because i wanted to observe the
performance and if i did not flush the buffer the output printed out at
once at the end of the program (after it finished). Maybe this is
because i used the Ruby Development Tools for Eclipse. I have to check
this if the behaviour is the same on the command line.

Hope that helps.

James Edward G. II

Best regards,

Joern


#9

Hi again,

I wrote:

Hi all,

Here’s my solution for the quiz No 74. It generates text with a first
order Markov Chain. Because the matrix of a Markov chain for native
language tests is sparse, the chain is stored in a hash of hashes for
better memory usage.

Now i implemented higher order Markov Chains as well. I could not resist
:slight_smile:

A first order chain, for example the following one

 @mc3 = MarkovChain.new()
 @mc3.add(1,2)
 @mc3.add(3,4)

is stored as the following hash of hashes

{1=>{2=>1}, 3=>{4=>1}}

A higher order chain, for example the following

 @mc7 = MarkovChain.new(order = 2)
 @mc7.add_elems([20,21,20,22,21,22,20,23])

is also stored as a hash of hashes, but the keys are arrays

{[22, 21]=>{22=>1}, [22, 20]=>{23=>1}, [20, 21]=>{20=>1}, [21,
22]=>{20=>1}, [20,22]=>{21=>1}, [21,20]=>{22=>1}}

Because i am new to Ruby, i had to learn a few things about Arrays,
Objects and dup(). I still do not understand this fully (perhaps it is
to late here 00:46 CET), but in the method MarkovChain.add() i need to
dup the arrays, otherwise some keys will get overwritten.

In class MarkovChain

Add an edge from node a to node b.

def add(a, b)
# this dup is needed, otherwise elements will be overwritten
a = a.dup() if a.instance_of?(Array)
@absolute[a] ||= {}
@absolute[a][b] ||= 0
@absolute[a][b] += 1
@sum_edges[a] ||= 0
@sum_edges[a] += 1
@dirty[a] = true
end

I will try to find it out tommorow. By the way, below is an example text
i generated for order 3 after i fed 4 books of Chesterton into the
chain. Beginning with order 3 i think the text looks like a patchwork or
collage of some phrases and sentences of the original texts.

Best regards,

Joern

. Up to a certain point it was a wordless clamour startlingly close, and
loud enough to be distinct if each word had not killed the other. No,
replied Fisher. The Renaissance nobles of the Tudor time were like that.
It is awful, I think it goes far enough! said Flambeau; but my
information is fragmentary, and only twelve members of the Central
Anarchist Council. The gentleman who has for some time past played, with
propriety and general applause, the difficult part of Thursday, has died
quite suddenly. Consequently, he was willing to go into Parliament as a
yeoman or a gentleman or a Jacobite or an Ancient Briton, I should say
that, Crane went on. This also tempted him, that in a more partial and
also a more premature fashion, for his voice was colourless and sad.
What I did next does matter: I gave him a rather wild stare, March had
yet another unexpected emotion, for his guide hailed the man as Hoggs
and introduced him as Sir Howard Horne, then introducing his so-called
Socialist budget, and prepared to expound it in an interview with so
promising a penman. Harold March was to have the gold of Glengyle. So
far the crime seemed clear enough; and while Boyle was looking into it
he heard a thud behind him, and then a sudden stillness, as of a stone
statue walking. He might have been firebrands. The mutinies simmered
down; the men who must have known that particular house to be so
accurately inaccurate. But what makes you think it a specimen. Put the
same feather with a ribbon and an artificial flower and everyone will
think it a wildcat, though it may have been a clever fellow, answered
the priest. As they raced along to the gate out of which poured and
rolled, not Roman, but very late, and that this, my successful
masquerade, might be of considerable value to the public, placed it here
with his own pale blue ones, and said, Do you want me to tell you all
you want to guess what he doing, keep behind him. About his life and
fortune on the table, and went below the surface in a way that at once
puzzled and reassured him. On the oranges was the equally clear and
exact description, Finest Brazil nuts, 4d. a lb. M. Brun had a dark,
handsome lady, of no little majesty, and rather like a mosaic palace,
rent with earthquakes; or like a Dutch tulip garden blown to the stars.
The other did not answer; he was free in free London, and drinking