Reg - Ruby Extended Grammar 0.4.6


#1

I am pleased to announce version 0.4.6 of Reg, the Ruby Extended
Grammar, a tool for matching patterns in ruby object graphs.

Reg is a mini-language that’s embedded directly in ruby because it
does all its tricks with operator overloading. Reg attempts to emulate
both the obscurity and compact power of regular expressions.

Reg is a library for matching patterns in all the important core data
structures as well as tools for extending the match criteria to
include just about any ruby data type you could want. Reg provides
matchers for Strings (via Regexps), Symbols, Hashes, and several
alternatives for matching Objects, but the main feature is the ability
to match Arrays of arbitrary ruby data using vaguely Regexp-like
syntax.

This version of Reg brings many exciting new features:
item_that, a mini-library for constructing method-based object queries
a greatly expanded user guide (regguide.txt), about half-finished now.
an ordered hash matcher
a method-signature matcher, like respond_to?
a working object matcher (which I had thought was working before, but
it turns out (to my chagrin) was not)

But by far the most important new feature is one that’s mostly not
user-visible: the re-vamped backtracking engine, now named
Reg::Progress. (Actually, a lot more classes participate in
backtracking… but never mind that.) Reg::Progress is a more object
oriented interface to what was a nest of arrays before. Like the
arrays, it keeps track of match progress through the data, and also
will enable some new (but badly needed) features like
backreferences/variable bindings, substitutions, positions, and
lookahead/lookback.

0.4.6 examples:

Matches a single item whose method ‘length’ returns a Fixnum:
item_that.length.is_a? Fixnum

There’s a new way to match hashes; it looks more-or-less like the old
way and behaves a little differently. The old type of hash matcher
(now called an unordered hash matcher) looked like:

+{/fo+/=>8, /ba+r/=>9}

The new syntax uses +[] instead of +{} and ** instead of =>. It’s
called an ordered hash matcher. The order of filter pairs given in an
ordered matcher is the order comparisons are done in. The same is not
true within unordered matchers, where order is inferred from the
nature of the key matchers. The ordered equivalent of the last example
is:

+[/fo+/**8, /ba+r/**9]

Both match hashes whose keys match /fo+/ with value of 8 or match
/ba+r/ with value of 9 (and nothing else). But if the data looks like:
{“foobar”=>8}, then it is guaranteed to match the second (because
/fo+/ is always given a chance first), but might or might not match
the first (because the order isunspecified).

Here’s an example of a Reg::Knows matcher, which matches objects that
have the #slice method:
-:slice

0.4.5 examples:

Matches array containing exactly 2 elements; 1st is another array, 2nd
is integer:
+[Array,Integer]

Like above, but 1st is array of arrays of symbol
+[+[+[Symbol+0]+0],Integer]

Matches array of at least 3 consecutive symbols and nothing else:
+[Symbol+3]

Matches array with at least 3 consecutive symbols in it somewhere:
+[OBS, Symbol+3, OBS]

Matches array of at most 6 strings starting with ‘g’
+[/^g/-6] #no .reg necessary for regexp

Matches array of between 5 and 9 hashes containing a key :k pointing
to something non-nil:
+[ +{:k=>~nil.reg}*(5…9) ]

Matches an object with Integer instance variable @k and property (ie
method) foobar that returns a string with ‘baz’ somewhere in it:
-{:@k=>Integer, :foobar=>/baz/}

Matches array of 6 hashes with 6 as a value of every key, followed by
18 objects with an attribute @s which is a String:
+[ +{OB=>6}*6, -{:@s=>String}*18 ]

Many other new features are sketched out now, but not fully
implemented or tested (much less documented!), so don’t expect them to
work. (A lot of these depend on a further refactoring of the engine’s
internal interface.)

Internally, Reg::Progress uses Eric M.'s Cursor, since it
provides a unified interface over arrays, strings, and files. Right
now of these only array is supported by Reg. What this means is that
Reg::Array can still match only arrays, but in the future it will be
able to match strings and files as well. Due to the use of Cursor,
this version of Reg is substantially slower than the last, especially
when there’s lots of backtracking.

Using Reg for traditional lexing and parsing tasks is still
unsupported. (However, Reg is much closer now to a solution for both
these features than in previous releases. Matching Strings and Files
against Reg::Array amounts to lexing. Parsing is even closer: all
that’s needed is substitution, which is my next major goal.)

Reg is a rubyforge project. You can find the main page here:
http://www.rubyforge.org/projects/reg

For the first time, Reg is now available as a ruby gem. Those with
rubygems can install Reg via this command: ‘gem install reg’. The .gem
file can also be found here:
http://rubyforge.org/frs/download.php/7198/reg-0.4.6.gem
The latest tarball can be downloaded here:
http://rubyforge.org/frs/download.php/7199/reg-0.4.6.tar.gz