Regex Searching on Arbitrary Sequences

Good day Rubyists,

I’ve just finished a write-up on an interesting problem: using Ruby’s
Regexp engine
to search arbitrary sequences of potentially heterogenous objects. It’s
based on the
more specific instance used in Ripper in 1.9. I’ve packaged it into a
gem though it is
a bit rough around the edges.

The post can be found here:
http://carboni.ca/blog/p/Regex-Search-on-Arbitrary-Sequences

And the gem can be found here:

The gem requires Ruby 1.9+.

Cheers,
Mike Edgar
http://carboni.ca/

Michael E. wrote:

I’ve just finished a write-up on an interesting problem: using Ruby’s Regexp
engine
to search arbitrary sequences of potentially heterogenous objects. It’s based on
the
more specific instance used in Ripper in 1.9. I’ve packaged it into a gem though
it is
a bit rough around the edges.

The post can be found here:
http://carboni.ca/blog/p/Regex-Search-on-Arbitrary-Sequences

And the gem can be found here: https://github.com/michaeledgar/object_regex

This is pretty cool. I never understood why pretty much every language
except Erlang artificially restricts Regexps to text. (Erlang also
allows regular-expression-like pattern matching on bit strings.)

Functional languages and increasingly also modern OO languages (e.g.
Newspeak) have structural pattern matching over arbitrary types, but
without the parsing feature of Regexps (alternation, repetition, …).
Scripting languages have Regexps but only over text strings, not
arbitrary types.

What I really would like to see is the union of pattern matching and
Regexps, ranging over arbitrary types. Unfortunately, I don’t have the
slightest idea what that would like.

jwm

In talking it over with the co-writer of the Regex-Searching writeup, we
think
that with a bit of massaging of the existing code, defining meaningful
#reg_desc
methods on Array, Class, Hash, and Object could get a good part of the
way there.

class Class
alias_method :reg_desc, :name
end
=> Class

class Object
def reg_desc
class.reg_desc
end
end
=> nil

ObjectRegex.new(‘Fixnum String+ Regexp?’).all_matches([1, ‘hi’, 2, 3, 4,
‘world’, ‘there’, /abc/])
=> [[1, “hi”], [4, “world”, “there”, /abc/]]

The syntax is a bit restrictive in the current version of object_regex,
but I came up with this
quickly for tuple searching:

class Array
def reg_desc
‘Array_’ + map(&:reg_desc).uniq.join(’’)+’
end
end
=> nil

ObjectRegex.new(‘Array_String_Fixnum_+’).match([ [‘string’, /regex/],
[‘string2’, 1], [‘string3’, 3], [‘string4’] ])
=> [[“string2”, 1], [“string3”, 3]]

I used a cautiously restrictive regex for picking the tokens out of the
input pattern, but things like standard generics
syntax (Array) could be possible.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs