ANN: Regexador - A mini-language for regular expressions

addis_a · September 7, 2013, 12:51am

This is a new project, but is reasonably mature for its age.

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal F.

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

dot = "."
num = "25" D5 | `2 D4 D | maybe D1 1,2*D
match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador “script” or “program”
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a “real”
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

require 'regexador'

program = <<-EOS
  dot = "."
  num = "25" D5 | `2 D4 D | maybe D1 0,2*D
  match WB num dot num dot num dot num WB end
EOS

pattern = Regexador.new(program)

puts "Give me an IP address"
str = gets.chomp

rx = pattern.to_regex    # Can retrieve the actual regex

if pattern.match?(str)   # ...or use in other direct ways
  puts "Valid"
else
  puts "Invalid"
end

Hal_F · September 7, 2013, 7:47am

From the README:

“I’m thinking of ignoring these features for now:
Unicode chars”
And out. This is not a serious endeavour.

Am 07.09.2013 um 00:50 schrieb Hal F. [email protected]:

Hal_F · September 7, 2013, 8:02pm

On 2013-09-06, at 11:50 PM, Hal F. [email protected] wrote:

and more may be added.
  puts "Invalid"
end

This looks like a fun project which Ill look into.

I think youve made regexes look worse than they need to (though that
might well be how a person unfamiliar with regexes actually uses them).
The comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Rubys
regular expressions, remembering that \A and \z are the real end of
string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
25[0-5] | # 250 … 255
2[0-4]\d | # 200 … 249
[01]?\d{1,2} # 0 … 199
)
/x

IP_ADDR4 = / \A #{BYTE} . #{BYTE} . #{BYTE} . #{BYTE} \z /x

p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
puts “Good”
else
puts “Bad”
end

END

Of course my Perl history makes the regex version seem clear to me.

I would usually decompose the text using a regular expression and then
do the validation using code, for example something like:

def ipv4_address?(string)
md = /\A (\d{1,3}) . (\d{1,3}) . (\d{1,3}) . (\d{1,3}) \z/x.match
string
md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

–

Mike S. [email protected]
http://www.stok.ca/~mike/

The “`Stok’ disclaimers” apply.

Hal_F · September 8, 2013, 2:29pm

On Sat, Sep 7, 2013 at 12:50 AM, Hal F. [email protected]
wrote:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2}).(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

Ruby regular expression; there are a few other features and functions,
EOS
else
puts “Invalid”
end

Reminds me a bit of something I did almost exactly six years and one
month ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263785

Cheers

robert

Hal_F · September 8, 2013, 3:37am

On Sep 7, 2013, at 11:02 , Mike S. [email protected] wrote:

I would usually decompose the text using a regular expression and then do the
validation using code, for example something like:

def ipv4_address?(string)
md = /\A (\d{1,3}) . (\d{1,3}) . (\d{1,3}) . (\d{1,3}) \z/x.match string
md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

You know what’s even better? Not writing anything:

require ‘ipaddr’
=> true
i = IPAddr.new(“192.168.2.100”)
=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>
i = IPAddr.new(“192.168.2.257”)
IPAddr::InvalidAddressError: invalid address

Hal_F · September 9, 2013, 5:00pm

Suit yourself. I put off working on that until September, i.e.,
I started three days ago.

But as you are “out,” I suppose you will never see this anyway.

Hal

Hal_F · September 9, 2013, 5:04pm

In this case, very true. I have only touched the ipaddr lib
once, but I see it to be very useful.

Hal

Hal_F · September 9, 2013, 5:06pm

I will look at this when I have time.

It would not be the first time you were six years ahead of me.

Hal

Hal_F · September 9, 2013, 5:03pm

Mike,

You’re correct, of course. Multiline regular expressions are
much more readable in general.

Many would argue that the entire project is not worthwhile at all.

My personal opinion is that there is a threshold (which is itself a
matter
of opinion) where regexes become needlessly difficult to read.

Hal

Hal_F · September 9, 2013, 6:54pm

On Mon, Sep 9, 2013 at 5:06 PM, Hal F. [email protected] wrote:

I will look at this when I have time.

It would not be the first time you were six years ahead of me.

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I’m sure it looks nice for at least six years.

Cheers

robert

Hal_F · September 27, 2013, 10:00pm

I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they’re both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in
implementation
they are different. Obviously I like my own better. Arguably it is “more
different” from these other two than they are from each other.

Hal

On Thu, Sep 26, 2013 at 6:24 PM, Eric C. <

Hal_F · September 28, 2013, 9:05am

+1 for the choosen name Regexador!

Abinoam Jr.
(From Brazil )

Hal_F · September 27, 2013, 1:25am

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

http://krainboltgreene.github.io/hexpress/?utm_source=rubyweekly&utm_medium=email

On Mon, Sep 9, 2013 at 11:53 AM, Robert K.