Grammars (mini-scripting languages)


#1

I’m working on implementing mini-scripting languages for two different
projects, so I’m building a framework that could handle the task
generically.

Does this seem like a good way to approach it?
1. Store each command’s matching regular expression and ruby code
within the database. (sample fixture below)
2. For each line in script:
Test line against each command’s corresponding regular
expression
If matched, execute the command’s ruby code using an
instance_eval.

My thoughts are:
1. Storing executable code in the database is a security problem
2. instance_eval is slow
3. The alternative (a big if/elsif tree) would span many pages and
be unweildy.

Have a better suggestion?

Sample code:
def compile
syntax.each do |line|
command = commands.find { |c| c.match? line }
raise “Command not found that can process ‘#{line}’” if command.nil?
instance_eval command.ruby
end
end

Sample commands fixture:

label:
id: 1
name: label
regexp: ^Q (.*)$
ruby: puts “$1\n”

single-punch:
id: 2
name: single-punch
regexp: ^X-(\d+) (.*)$
ruby: puts " o $2\n"

multiple-punch:
id: 3
name: multiple-punch
regexp: ^M-(\d+) (.*)$
ruby: puts " [ ] $2\n"

blank-line:
id: 4
name: blank-line
regexp: ^\s*$
ruby: # Do nothing


#2

On Thu, 2006-02-02 at 09:57 +0900, Michael J. wrote:

1. Storing executable code in the database is a security problem
2. instance_eval is slow
3. The alternative (a big if/elsif tree) would span many pages and 

be unweildy.

Have a better suggestion?

Not necessarily better, but how about something like:

class Commands
  def Q(args)
    puts args
  end

  def X(args)
    if args =~ /^(\d+) (.*)$/
      puts "  o #{$2}"
    end
  end

  def M(args)
    if args =~ /^(\d+) (.*)$/
      puts "  [ ] #{$2}"
    end
  end

  def dispatch(line)
    if line =~ /([QXM])-?(.*)/
      send($1.intern, $2)
    else
      raise "Invalid input: #{line}"
    end
  end
end

s = <<EOS
Q Just a label
X-23 Single punched
M-11 Multi punched
J-12 Bad input
EOS

cmds = Commands.new
s.each { |c| cmds.dispatch(c) }

(Obviously I guessed a bit with the input format).
Output:

 Just a label
  o Single punched
  [ ] Multi punched
-:22:in `dispatch': Invalid input: J-12 Bad input (RuntimeError)
        from -:35
        from -:35

#3

Ross B. wrote:

On Thu, 2006-02-02 at 09:57 +0900, Michael J. wrote:

1. Storing executable code in the database is a security problem
2. instance_eval is slow
3. The alternative (a big if/elsif tree) would span many pages and 

be unweildy.

Have a better suggestion?

Not necessarily better, but how about something like:

class Commands
[snip]
def dispatch(line)
if line =~ /([QXM])-?(.*)/
send($1.intern, $2)
else
raise “Invalid input: #{line}”
end
end
end

That’s neat, Ross. I wasn’t familiar with the send command. Looks like
the consequence is that the grammar has to fit an easy regular
expression or you’d be duplicating it in the match and again in the
method definition… Well, that’s not necessarily a bad thing.
Consistency is good too.

I need to think about this.

What do programmers normally do when they have a case statement that’s
30 or more items long? Previously I’ve just left it as a case statement
and spent the life of the project ticked at it.


#4

On Thu, 2006-02-02 at 13:17 +0900, Michael J. wrote:

Not necessarily better, but how about something like:
end

That’s neat, Ross. I wasn’t familiar with the send command. Looks like
the consequence is that the grammar has to fit an easy regular
expression or you’d be duplicating it in the match and again in the
method definition… Well, that’s not necessarily a bad thing.
Consistency is good too.

Agreed, but it did bug me a bit, too :slight_smile: Depending on the actual input
format you could optimise that away though I think, e.g.

class Commands
  def M(args)
    # Notice that regexp here is now responsible for
    # for handling the '-' after the initial letter.
    if args =~ /^-(\d+) (.*)$/
      puts "  [ ] #{$2}"
    end
  end

  def dispatch(line)
    begin
      send(line.slice!(0,1), line)
    rescue NoMethodError
      raise "Invalid input: #{line}"
    end
  end
end

That way you’re doing only one match per dispatch, and validating
implicitly (Ruby will raise NoMethodError if the command is bad). Since
we’re forcing only a single letter, it shouldn’t be possible for people
to input e.g. ‘exit-666 1’ or something to breach security.

A win with this approach I think is that it keeps everything where it
should be, i.e. the commands themselves are responsible for processing
their arguments, however they see fit. Also, you can easily add new
commands at runtime, simply by definining a new method. There’s no
‘command registry’ anywhere.

One other change I’d make to my previous post would be to make the
command methods private.

What do programmers normally do when they have a case statement that’s
30 or more items long? Previously I’ve just left it as a case statement
and spent the life of the project ticked at it.

Ordinarily I think I’d consider it a code smell (or maybe a “design
smell”?). Maybe I’d fix it, maybe not, but like you I’d at least want
to :).


#5

In message removed_email_address@domain.invalid, Michael
Judge removed_email_address@domain.invalid writes

Not necessarily better, but how about something like:
end
30 or more items long? Previously I’ve just left it as a case statement
and spent the life of the project ticked at it.

My inclination is generally to drive the execution by table lookup.


#6

What do programmers normally do when they have a case statement that’s
30 or more items long? Previously I’ve just left it as a case statement
and spent the life of the project ticked at it.

I don’t let them in the code in the first place. A situation that would
need a case/switch that’s more than about 5 +/-2 item long gets
redesigned
during initial implementation.

When I take over a code-base that has something like that in it, it gets
redesigned the first time I have to touch that case statement.

Don’t live with broken windows.

Regards,
Harley Pebley