Forum: Ruby ANN: Regexador - A mini-language for regular expressions

3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-07 00:51
(Received via mailing list)
This is a new project, but is reasonably mature for its age.  ;)

See http://github.com/hal9000/regexador

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README  (see below).

Comments welcome.

Thanks!
Hal Fulton


Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:


/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex    # Can retrieve the actual regex

    if pattern.match?(str)   # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end
D1f1c20467562fc1d8c8aa0d328def62?d=identicon&s=25 Florian Gilcher (skade)
on 2013-09-07 07:47
(Received via mailing list)
From the README:

"I'm thinking of ignoring these features for now:
Unicode chars"
And out. This is not a serious endeavour.

Am 07.09.2013 um 00:50 schrieb Hal Fulton <rubyhacker@gmail.com>:
2ffac40f8a985a2b2749244b8a1c4161?d=identicon&s=25 Mike Stok (Guest)
on 2013-09-07 20:02
(Received via mailing list)
On 2013-09-06, at 11:50 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

>
>
> and more may be added.
>
>       puts "Invalid"
>     end

This looks like a fun project which Ill look into.

I think youve made regexes look worse than they need to (though that
might well be how a person unfamiliar with regexes actually uses them).
The comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Rubys
regular expressions, remembering that \A and \z are the real end of
string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
          25[0-5]      | # 250 .. 255
          2[0-4]\d     | # 200 .. 249
          [01]?\d{1,2}   # 0 .. 199
         )
       /x

IP_ADDR4 = / \A #{BYTE} \. #{BYTE} \. #{BYTE} \. #{BYTE} \z /x

# p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
  puts "Good"
else
  puts "Bad"
end

__END__

Of course my Perl history makes the regex version seem clear to me.

I would usually decompose the text using a regular expression and then
do the validation using code, for example something like:

def ipv4_address?(string)
  md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match
string
  md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2013-09-08 03:37
(Received via mailing list)
On Sep 7, 2013, at 11:02 , Mike Stok <mike@stok.ca> wrote:

> I would usually decompose the text using a regular expression and then do the
validation using code, for example something like:
>
> def ipv4_address?(string)
>   md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match string
>   md && md.captures.all? { |num| num.to_i.between?(0, 255) }
> end

You know what's even better? Not writing anything:

>> require 'ipaddr'
=> true
>> i = IPAddr.new("192.168.2.100")
=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>
>> i = IPAddr.new("192.168.2.257")
IPAddr::InvalidAddressError: invalid address
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2013-09-08 14:29
(Received via mailing list)
On Sat, Sep 7, 2013 at 12:50 AM, Hal Fulton <rubyhacker@gmail.com>
wrote:
>
>
/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/
> Ruby regular expression; there are a few other features and functions,
>     EOS
>     else
>       puts "Invalid"
>     end

Reminds me a bit of something I did almost exactly six years and one
month ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

;-)

Cheers

robert
3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-09 17:00
(Received via mailing list)
Suit yourself. I put off working on that until September, i.e.,
I started three days ago.

But as you are "out," I suppose you will never see this anyway.

Hal
3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-09 17:03
(Received via mailing list)
Mike,

You're correct, of course. Multiline regular expressions are
much more readable in general.

Many would argue that the entire project is not worthwhile at all.

My personal opinion is that there is a threshold (which is itself a
matter
of opinion) where regexes become needlessly difficult to read.

Hal
3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-09 17:04
(Received via mailing list)
In this case, very true. I have only touched the ipaddr lib
once, but I see it to be very useful.

Hal
3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-09 17:06
(Received via mailing list)
I will look at this when I have time.

It would not be the first time you were six years ahead of me.  :)

Hal
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2013-09-09 18:54
(Received via mailing list)
On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me.  :)

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years. ;-)

Cheers

robert
Abdb670e1c130f96f947a94d03c02efa?d=identicon&s=25 Eric Christopherson (echristopherson)
on 2013-09-27 01:25
(Received via mailing list)
The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

http://spin.atomicobject.com/2013/08/26/verbal-exp...

http://krainboltgreene.github.io/hexpress/?utm_sou...


On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme
3af6542441e3065972b0825416b71f1a?d=identicon&s=25 Hal Fulton (Guest)
on 2013-09-27 22:00
(Received via mailing list)
I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they're both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in
implementation
they are different. Obviously I like my own better. Arguably it is "more
different" from these other two than they are from each other.

Hal



On Thu, Sep 26, 2013 at 6:24 PM, Eric Christopherson <
09a32175057418748822c587ac08c429?d=identicon&s=25 Abinoam Jr. (abinoampraxedes_m)
on 2013-09-28 09:05
(Received via mailing list)
+1 for the choosen name Regexador!

Abinoam Jr.
(From Brazil ;-) )
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.