Dice Roller (#61)

mansfiem · January 12, 2006, 8:07am

My reason for choosing a dice roller is somewhat selfish: I was
interested
to see how people would solve a problem that required parsing a
mini-language. I’ve written lexers and parsers before, years ago, but I
wanted to see what methods the Ruby gurus would employ.

I was not unsurprised. While there were some “traditional” solutions,
there
were also a few that made me remember something I realized in past Ruby
Quizzes: it’s all about pattern matching. One of the solutions to past
quiz
#24 (Texas Hold’Em) showed how much power can be gained by a careful
examination of the patterns in the problem; with a few carefully built
regular expressions, some gsub calls and a bit of magic, you can turn
what
looks like a complex problem into something much similar. I should have
remembered that (or, at the least, realized that someone else would).

Discussion on the list about the quiz was rather active, most of the
time
getting into the nitty-gritty details of what d10 and 5d6d7 meant, and
occassionally joking about munchkins and their stats of all 18 (with a
strength of 18/00, of course). As solutions came in, it was nice to see
about four or five people making their first attempt. With them, this
quiz
gets the bronze medal for submission count, behind the LCD Numbers quiz
(#14) and the Numeric Maze quiz (#60).

I found unique bits in most every solution; even those that took almost
identical approaches would often, at the least, have different regular
expressions. If you are afraid of the mighty regexp, now would be a good
time to study some samples, since the syntax for the dice expressions is
reasonably simple, and many people documented their regular expressions.

Most solutions fell into one of a few types. I’ll cover each a bit and
point
out anything in particular that attracted my attention.

The first class of solutions massaged the input expression a bit, then
used
Ruby’s eval method to do the work. This simple solution eluded me, since
I
was so fixed on seeing parsing code. Apparently a bunch of you realized
that
(as Paul N. so nicely put) “we don’t need no steenking parsers.” A
few
substitutions and a helper function or two was enough to do the trick,
since
aside from the ‘d’ operator, the expression were valid Ruby already.
Let’s
take a look at Christian N.'s solution:

class Integer
  def d(n)
    (1..self).inject(0) { |a, e| a + rand(n) + 1 }
  end
end

First off we have the random number generation; most solutions had this
or a
very similar variant. So the call 3.d(6) will roll and accumulate three
six-sided dice, and the expression 3.d(6) is almost a valid dice
expression.
It is in Christian’s initialization method that dice expressions are
turned
into Ruby expressions:

def initialize(dice)
   @src = dice.gsub(/d(%|00)(\D|$)/, 'd100\2').
               gsub(/d(\d+)/, 'd(\1)').
               gsub(/(\d+|\))d/, '\1.d').
               gsub(/\d+/) { $&.gsub(/^0+/, '') }

   @dice = eval "lambda { #@src }"
end

(I’ve left out a bit of error checking; see the full solution to see
everything.) The first substitution fixes up percentage and double-zero
values as 100. The second turns ‘d’ as a binary operator into ‘d’ as a
method call. The third substitution provides the default dice count of 1
if it wasn’t specified. And the last substitution removes leading zeroes
of
integers; this last substitution prevents the Ruby from interpreting
values
as octal.

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those
substitutions
every time roll is called.

There were several variations of the eval method solution, mostly in the
regular expression substitutions. A couple variations rewrote existing
operators (rather than adding new methods to Integer or Fixnum). Rather
than
change the ‘d’ operator into a method call, he made a slightly different
twist and rolled the dice in method_missing. Perhaps Bill didn’t know
that
classes could be extended, or maybe he was hacking around for fun.
Dennis
Ranke had a eval-looking solution with no eval to be found, because the
gsub
calls actually substituted the results during parsing. And Stefan W.
wins
for the shortest solution: three lines of hard to read Ruby code.

The second class of solutions involved using a parsing library or parser
generator tool. Three of these showed up: Pierre Barbier de Reuille used
racc, Austin Z. used syntax.rb, and David T. pulled out ANTLR to
generate his parser. Each of these will let you describe your language
in
BNF format, or something quite similar to it, and it will generate a
Ruby
language parser for you. These can be quite powerful solutions, although
readability of the language specification and the support code varies
widely. But I do recommend looking into these tools if you have need to
do
something more powerful that dice expressions; generating a complex
parser
with one of these would save much time and effort over a handcoded
solution.

And now we’re at the third class of solutions: the handcrafted parsers.
About nine people handcrafted parsers, mostly recursive descent which is
rather easy to code. Because these generally involve more code, I won’t
show
much, but I suggest taking a look at some of them.

Dominik B.'s solution, while it prefers to use the eval method as it
goes along rather than build a parse tree, is a easy to read solution
and
makes good use of StringScanner. Morus W.'s solution is decent to
follow
and understand and builds up an expression tree with his Expr class,
which
gets evaluated at the end with a single call to roll. My own solution is
similar, but creates a tree of proc objects. Andrew McGuinness and
Christer
Nilsson also have recursive descent parsers that can not only roll dice,
but
calculation frequency distributions and histograms, and cheat.

I want to look now at a couple of standouts. First, let’s look at
Dennis
Ranke’s second submission. It’s a recursive descent parser, contained in
his
class RDParser which is included with the submission but a little too
long
to summarize here. However, what RDParser has done is allow Dennis to
write
this solution:

parser = RDParser.new do
  token(/\s+/)
  token(/\d+/) {|m| m.to_i }
  token(/./) {|m| m }

  start :expr do
    match(:expr, '+', :term) {|a, _, b| a + b }
    match(:expr, '-', :term) {|a, _, b| a - b }
    match(:term)
  end

  rule :term do
    match(:term, '*', :dice) {|a, _, b| a * b }
    match(:term, '/', :dice) {|a, _, b| a / b }
    match(:dice)
  end

  def roll(times, sides)
    (1..times).inject(0) {|a, b| a + rand(sides) + 1 }
  end

  rule :dice do
    match(:atom, 'd', :sides) {|a, _, b| roll(a, b) }
    match('d', :sides) {|_, b| roll(1, b) }
    match(:atom)
  end

  rule :sides do
    match('%') { 100 }
    match(:atom)
  end

  rule :atom do
    match(Integer)
    match('(', :expr, ')') {|_, a, _| a }
  end
end

Do I need to say how beautiful that is? It’s short, clean, easy to read
and
understand… and all Ruby. Package that RDParser into a module and
ship
it! I think I’ve found my next parsing tool…

Let’s look at the guts of one more solution, by Pablo Hoch. Pablo’s
solution
didn’t exactly fit into the other classes of solution: it’s somewhere
between a parser and an eval-uator. He decided to take the infix dice
expression and turn it into a RPN (Reverse Polish Notation, that is,
postfix) expression:

def to_rpn(infix)
   stack, rpn, last = [], [], nil
   infix.scan(/\d+|[-+*\/()d%]/) do |token|
     case token
       when /\d+/
         rpn << token
       when '%'
         rpn << "100"
       when /[-+*\/d]/
         while stack.any? && stronger(stack.last, token)
           rpn << stack.pop
         end
         rpn << "1" unless last =~ /\d+|\)|%/
         stack << token
       when '('
         stack << token
       when ')'
         while (op = stack.pop) && (op != '(')
           rpn << op
         end
     end
     last = token
   end
   while op = stack.pop
     rpn << op
   end
   rpn
end

I love to see code that I can understand immediately, and also recognize
how
it could be useful for other applications. This is definitely one of
those
because postfix expressions (or prefix, to be complete) are a breeze for
a
computer to evaluate:

def roll
   stack = []
   @expr.each do |token|
     case token
       when /\d+/
         stack << token.to_i
       when /[-+*\/d]/
         b = stack.pop
         a = stack.pop
         stack << a.send(token.to_sym, b)
     end
   end
   stack.pop
end

Thanks to everyone for the great submissions and lively discussion on
the
mailing list (and the correction to the quiz that I missed!). I just
hope
when we all sit down to play that I don’t see any of you with all
characters
stats of 18.

mansfiem · January 12, 2006, 4:02pm

On Jan 12, 2006, at 1:04 AM, Matthew M. wrote:

Thanks to everyone for the great submissions and lively discussion
on the
mailing list (and the correction to the quiz that I missed!).

I want to thank Matthew M. for a wonderful problem and a
sensational summary! I also happen to know that Matthew already has
another quiz in the queue for a week from tomorrow. How cool is that?

Matthew, you rock.

James Edward G. II

mansfiem · January 12, 2006, 4:23pm

Matthew M. [email protected] writes:

               gsub(/\d+/) { $&.gsub(/^0+/, '') }
as octal.
The third substitution doesn’t provide a default count (Dice#d does
that),
instead, it converts 2d(6) to 2.d(6).

The morphed express is saved away in a lambda, which allows Christian to
reevaluate the expression repeatedly without performing those substitutions
every time roll is called.

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to parse once.

Thanks for the nice writeup,

mansfiem · January 12, 2006, 4:57pm

On Thu, Jan 12, 2006 at 04:04:58PM +0900, Matthew M. wrote:
[…]
} Most solutions fell into one of a few types. I’ll cover each a bit and
point
} out anything in particular that attracted my attention.
}
} The first class of solutions massaged the input expression a bit, then
used
} Ruby’s eval method to do the work.
[…]
} The second class of solutions involved using a parsing library or
parser
} generator tool.
[…]
} And now we’re at the third class of solutions: the handcrafted
parsers.
} About nine people handcrafted parsers, mostly recursive descent which
is
} rather easy to code.
[…]

I want to go through my handcrafted parser solution, which may have
gotten
missed in the shuffle. This is especially likely since I also submitted
one
using the eval method. In fact, my handcrafted parser used eval, but
only
for the +, -, /, and * operators.

The important bit is that all of the nodes in my syntax tree respond to
to_i with the computed result of its subexpression. This was largely to
make numbers behave the same way that my syntactic elements did, and
vice
versa. My parsing method looks like this (with a fail check to make sure
nothing went wrong after each line, mainly for debugging purposes) to
create the syntax tree:

token_array = tokenize(expression)
uses String#scan and checks for garbage
token_array = validate_and_cook(token_array)
replaces some tokens with symbols, deals with dX => 1dX, d00 =>
d100, d% => d100, makes sure open parens have close parens and
the operators and operands are in grammatically appropriate places
parse_tokens(token_array)
a) token_array = parse_parens(token_array)
finds parenthesized expressions and runs the subarray through
parse_tokens; nested parens are handled recursively. Note that
there is no need for error checking, since the expression has
already been validated.
b) token_array = parse_ops(token_array, /^d$/) if token_array.length

1
finds all d operators and creates a DieOperator object with its
left and right operands. Since all the operators in this grammar
are left-associative, parse_ops is left-associative. The regex is
used to determine if the token is the operator(s) being handled.
c) token_array = parse_ops(token_array, /^[*/]$/) if
token_array.length > 1
same as b, but for * and / operators. In this case, ArithOperator
objects are created. The parse_ops method uses an OpsClass hash
that matches the operator (d, *, /, +, or -) to the appropriate
class to call new on.
d) token_array = parse_ops(token_array, /^[-+]$/) if
token_array.length > 1
same as c, but for + and - operators. Still using ArithOperator
objects.

The ArithOperator class is what uses eval. It is given the operator as a
string, and its to_i method calls eval
“#{@left.to_i}#{@op}#{@right.to_i}”
to produce its value.

The points I’m trying to make are:

the grammar was so simple that there was no need for a proper
recursive-descent parser
duck typing is way cool and let me treat numbers, DieOperators, and
ArithOperators identically
a little eval let me use one class instead of four for ArithOperator
doing all that validation ahead of time let me replace my syntax tree
with using eval for the whole expression without giving up my nice,
clear failure messages

} Thanks to everyone for the great submissions and lively discussion on
the
} mailing list (and the correction to the quiz that I missed!). I just
hope
} when we all sit down to play that I don’t see any of you with all
characters
} stats of 18.

This one was loads of fun, if not as challenging as the Numeric Maze
(which
was my first). It’s a pity I’ll be out of town this weekend and won’t be
able to do the next one.

–Greg

mansfiem · January 12, 2006, 11:04pm

And I can see I already that I need to read my regexps a little more
carefully. The third regexp in Christian N.'s solution doesn’t
provide the default 1 to the ‘d’ operator; it actually turns the ‘d’
operator into a method call on Integer. And the second regexp sets up
parenthesis around the number following the ‘d’ as arguments for that
call.
Silly me…

mansfiem · January 12, 2006, 11:17pm

From: “Matthew M.” [email protected]

Rather than change the ‘d’ operator into a method call, he made a slightly
different twist and rolled the dice in method_missing. Perhaps Bill didn’t
know that classes could be extended, or maybe he was hacking around for fun.

Yes I could probably dominate the “cheesiest approach” category.
I did extend a class in my “solution”… String#die_to_meth

Thanks for a fun quiz and an excellent summary.

Regards,

Bill

mansfiem · January 12, 2006, 11:29pm

Ya, you did extend a class.

Man, I must really be tired. Or forgot how to read. Or had a few brain
cells
short out. Or something. Or be really tired.

mansfiem · January 13, 2006, 8:58pm

On Jan 12, 2006, at 10:22 AM, Christian N. wrote:

I could have evaluated @src each time without substituting anew, but a
lambda only makes it need to parse once.

Hey Christian,

I didn’t have time to write my own solution to quiz #61, so I’m going
over yours, which is similar to what I was kicking around in my
head. Unfortunately, I confused by your eval statement, which is
wrapped over a new Proc object. Instead, why not wrap the lambda
around the eval like so?

@dice = lambda { eval @src }

The both seem to produce the same output, yet conceptually, my
approach makes more sense to me. Perhaps I’m missing something
crucial in my statement like…

Does eval parse @src for each call() message? Or does the parsing
happen on initialization of the Proc (like I would expect)? I
suppose I could trace the execution using the debugger, but I’m
unaware how to step inside a Kernel method like eval or lambda with it.

~ ryan ~

mansfiem · January 13, 2006, 9:05pm

On Jan 13, 2006, at 1:57 PM, J. Ryan S. wrote:

every time roll is called.
is wrapped over a new Proc object. Instead, why not wrap the
lambda around the eval like so?

@dice = lambda { eval @src }

The both seem to produce the same output, yet conceptually, my
approach makes more sense to me. Perhaps I’m missing something
crucial in my statement like…

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time
the code it called.

James Edward G. II

mansfiem · January 13, 2006, 9:11pm

On Jan 13, 2006, at 2:57 PM, J. Ryan S. wrote:

every time roll is called.
is wrapped over a new Proc object. Instead, why not wrap the
suppose I could trace the execution using the debugger, but I’m
unaware how to step inside a Kernel method like eval or lambda with
it.

~ ryan ~

@dice.call has eval parse @src everytime (@src may have changed). By
doing something like eval “lambda { #{@src} }” you parse @src once
and get back a lambda.

mansfiem · January 13, 2006, 9:23pm

On Jan 13, 2006, at 2:15 PM, J. Ryan S. wrote:

On Jan 13, 2006, at 3:03 PM, James Edward G. II wrote:

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time
the code it called.

Executed yes. But when is it parsed / checked for valid grammar,
syntax, etc.? Did you mean to say that the block of code is parsed
AND executed for each call() it receives?

lambda() doesn’t imply that, but eval() does. It must be reparsed
with each call.

James Edward G. II

mansfiem · January 13, 2006, 9:17pm

On Jan 13, 2006, at 3:03 PM, James Edward G. II wrote:

Does eval parse @src for each call() message?

Yes. You pass lambda() a block of code that is executed each time
the code it called.

Executed yes. But when is it parsed / checked for valid grammar,
syntax, etc.? Did you mean to say that the block of code is parsed
AND executed for each call() it receives?

~ ryan ~

mansfiem · January 13, 2006, 9:32pm

On Jan 13, 2006, at 3:15 PM, J. Ryan S. wrote:

~ ryan ~

obj = lambda { eval “#{2**10}” }

(1…1_000_000).each do
obj.call
end

$ time ruby test.rb

real 0m26.046s
user 0m24.377s
sys 0m0.358s

obj = eval “lambda { #{2**10} }”

(1…1_000_000).each do
obj.call
end

$ time ruby test.rb

real 0m3.132s
user 0m2.895s
sys 0m0.047s

What a difference! I’ll agree that parsing of eval’s string happens
on every call() message in the first example.

~ ryan ~