Having difficulty processing text files - some techniques?

I am trying to extract information from a po-ker hand history text file
as an exercise in learning RegExp and ruby file manipulation classes IO
and File.

The way the hand history text file is set out is like so:

Hand No. xxxx - Date xxxx Time xxxx
Game type: xxxx Blinds level: xxxx
Table type: xxx
Player 1: xxxx (chip count)
Player 2: xxxx (chip count)
Player 3: xxxx (chip count)
Player 4: xxxx (chip count)
Hole Cards Phase
Player 1 calls
Player 2 raise
Player 3 folds
Flop Phase
Player 1 raises
Player 2 calls
Turn Phase
etc
etc
River Phase
etc
etc
Player 2 wins

In my testing I have extracted successfully the very simple statistic of
VP$IP, which is the percentage of times a player Voluntarily Puts $ In
Pot pre-flop. So this is:

Number of time called or raise pre-flop / Number of hands at the table

My code (I used what I think is a Finite State Machine?):

File.open(“test_read.txt”) do |f|

count = 0
state = nil

while (line = f.gets)
case (state)
when nil
# Look for the words “Hole Cards” and if found turn on text
processing
if (line.match(/Hole Cards/))
state = :parsing
end
when :parsing
# Look for word “Flop” or “wins” and if found stop processing text
if (line.match(/Flop/)) || (line.match(/wins/))
state = nil
else
if (line.match(/#{name} calls/)) || (line.match(/#{name}
raises/)) then count += 1 end
end
end
end

return count

end

This code processes text only when inside the Hole Card phase (pre-flop
phase) between lines with words “Hole Cards” (start text processing) and
“Flop” or “wins” (stop processing text). It increments a counter if it
finds the words “calls” and “raises” next to a player’s name. This code
ran through the whole file, maybe over hundreds of hands, and extracted
the data.

The problem I am having is that I need to have the extracted data
associate with hand data. Hand data might be the date of the hand was
played, the number of players in the hand, the blind level. This means I
need isolate each hand, extract the hand data and put it in a hash for
example, then extract stats like VP$IP for each player (a hand-by-hand
approach rather than a file-wide approach). So the output might go into
a text file specific to each player that would look like:

Player 1
VP$IP 2P 3P 4P All HandsAtTables
2012-07-23 3 201 21 225 534
2012-07-24 45 10 3 58 1001
2012-07-25 5 5 5 15 420

Can I do this without storing data in arrays or objects? I’ve been
thinking I can use another Finite State Machine, only look for the words
“Hand” (start of hand) and “wins” (end of hand), the problem is I need
some of the data on the line that has the word “Hand” on it, something I
can’t do with my code above. I also need to switch between hands when
there are no lines between the word “wins” and the next line which will
have the word “Hands” on it indicating the start of the next hand.

I have been looking at Gregory B.'s Ruby Best Practices chapter 4
(http://majesticseacreature.com/rbp-book/pdfs/ch04.pdf) on text
processing methods but cannot get my head around his code.

Thx if you can help.

On Fri, Aug 3, 2012 at 10:39 AM, b1_ __ [email protected] wrote:

Player 2: xxxx (chip count)
etc
Pot pre-flop. So this is:
while (line = f.gets)
state = nil

need isolate each hand, extract the hand data and put it in a hash for
Can I do this without storing data in arrays or objects? I’ve been
thinking I can use another Finite State Machine, only look for the words
“Hand” (start of hand) and “wins” (end of hand), the problem is I need
some of the data on the line that has the word “Hand” on it, something I
can’t do with my code above. I also need to switch between hands when
there are no lines between the word “wins” and the next line which will
have the word “Hands” on it indicating the start of the next hand.

I don’t understand the part about not using arrays or objects. If you
need to accumulate stuff while processing that’s the only way.
It seems that you want to keep separate counts by date and number of
players. So I would use a hash whose key is a composite of both values
(a struct with both fields for example), or maybe a hash of hashes.

When you find the “Hand” line, you parse the date and the following
lines to find out the number of players. With this you get the key to
the hash. The value will be the count. Then you keep your current
logic of being before the flop and counting. When you find another
Hand, you first write the count to the hash with the current key, then
recompute the next key based on the Hand line and number of players.
Something like:

Key = Struct.new :date, :number_of_players
state = nil
current_key = nil
results = Hash.new(0) # keys will return a value of 0 when they don’t
exist
while (line = f.gets)
case (state)
when nil
if(line.match(/^Hand/)
#find the date and the number of players in subsequent lines
key = Key.new date,number_of_players
state = :hand_found
end
when :hand_found
# Look for the words “Hole Cards” and if found turn on text
processing
if (line.match(/Hole Cards/))
state = :parsing
end
when :parsing
# Look for word “Flop” or “wins” and if found stop processing
text, and store the current value
if (line.match(/Flop/)) || (line.match(/wins/))
results[key] += count
state = nil
else
if (line.match(/#{name} calls/)) || (line.match(/#{name}
raises/)) then count += 1 end
end
end
end
end

p results

This is untested, but might give you an idea.

Jesus.

On Fri, Aug 3, 2012 at 10:39 AM, b1_ __ [email protected] wrote:

2012-07-23 3 201 21 225 534
2012-07-24 45 10 3 58 1001
2012-07-25 5 5 5 15 420

Can I do this without storing data in arrays or objects?

Unlikely. You likely need a Hash for storage per hand and probably
more - depending on your evaluation requirements. Note that you can
create classes for holding data pretty easily by using Struct

HandData = Struct.new :date_played, :no_players, :blind_level

I’ve been
thinking I can use another Finite State Machine, only look for the words
“Hand” (start of hand) and “wins” (end of hand),

No, a second FSM for parsing only makes things unnecessary complicated.

the problem is I need
some of the data on the line that has the word “Hand” on it, something I
can’t do with my code above. I also need to switch between hands when
there are no lines between the word “wins” and the next line which will
have the word “Hands” on it indicating the start of the next hand.

I haven’t thought through all the details but chances are that you
just need more states. Basically you need one state per logical
section which you are parsing. You can then use nested case
statements, either state first and then line parsing or other way
round:

case state
when :foo
case line
when /Hole Cards/

state = :bar
end
when :baz
case line
when /keyword/

end

end

Btw. you can save one level of indentation by using File.foreach:

File.open name do |f|
f.each line |line|

end
end

becomes

File.foreach name do |line|

end
Kind regards

robert

On Fri, Aug 3, 2012 at 11:02 AM, Jess Gabriel y Galn
[email protected] wrote:

Player 1: xxxx (chip count)
Turn Phase
VP$IP, which is the percentage of times a player Voluntarily Puts $ In

  if (line.match(/Flop/)) || (line.match(/wins/))

end
played, the number of players in the hand, the blind level. This means I

It seems that you want to keep separate counts by date and number of

    key = Key.new date,number_of_players

There’s an error. That should be current_key = …

Jesus.

On Fri, Aug 3, 2012 at 12:07 PM, b1_ __ [email protected] wrote:

  1. etc.
    This is exactly what I proposed with two exceptions:

The hand data is not cleared after each Hand, so that you accumulate
in that hash the data for all hands by date, players, etc.
The second is that writing the file should be done at the end once.
The reason is that it’s easier to accumulate in a hash than in a file.
Your point 3 is more complex than you think.

I’m not a fast reader of code so still digesting what you’ve posted…

Hope you get some ideas,

Jesus.

“Jesús Gabriel y Galán” [email protected] wrote in post
#1071144:

On Fri, Aug 3, 2012 at 10:39 AM, b1_ __ [email protected] wrote:

need isolate each hand, extract the hand data and put it in a hash for
Can I do this without storing data in arrays or objects? I’ve been
thinking I can use another Finite State Machine, only look for the words
“Hand” (start of hand) and “wins” (end of hand), the problem is I need
some of the data on the line that has the word “Hand” on it, something I
can’t do with my code above. I also need to switch between hands when
there are no lines between the word “wins” and the next line which will
have the word “Hands” on it indicating the start of the next hand.

I don’t understand the part about not using arrays or objects.
Jesus.

How I imagined the program would work is like so:

  1. Step into first-hand-processing mode.
  2. Create hand-data hash with date, player no, blind level.
  3. Extract raw numbers and save to players raw-data text file,
    organising the raw numbers using the hand-data hash made in previous
    step.
  4. Clear hand-data hash.
  5. Exit first-hand-processing mode.
  6. Step into second-hand-processing mode.
  7. Create hand-data hash with date, player no., blind level.
  8. etc.

So no extracting all the lines of the hand and putting that into a
container object like an array and working on that. The only container
object created is the hand-data hash which gets wiped on moving to the
next hand. I thought maybe this would speed up the program, exspecially
if I needed to process 1000’s of hands.

I’m not really fussed though. It’s more important I understand the code
and there is some elegance about it and perhaps some expandability at
this stage. If it’s too confusing to use nested state machines I’m happy
to use whatever is suggested.

I’m not a fast reader of code so still digesting what you’ve posted…