Alternate Regular Expressions?

Ari_B · August 8, 2007, 2:28am

Kenneth McDonald wrote:

the biggest problem with regular expressions in most languages isn’t the
syntax, but rather
the inability to easily compose small REs into larger REs. Which is why
so many programs end
up with huge, unreadable REs.

You can do that in ruby rather simply:

example taken from an example earlier in this thread

name = /[a-z]+/i
host = /[a-z]+/i
tld = /com|net|org|edu/
input.scan(%r{\b#{name}@#{host}.#{tld}\b}) do |match|
puts “Found email address #{match}”
end

Regards
Stefan

Ari_B · December 20, 2009, 4:29am

rather
the inability to easily compose small REs into larger REs. Which is why
so many programs end
up with huge, unreadable REs. As a small example, it’s really nice (and
obvious) to be able to say

re3 = re1 + re2

I agree with this and that’s why I have the following add-on in my
standard lib:

class Regexp
def +(other_regex)
…
end
end

Note also that you can do

re3 = /#{re1}#{re2}/ which fits my needs pretty well.

-r

Ari_B · August 9, 2007, 8:53am

2007/8/8, Robert K. [email protected]:

Did you think about something like this (attached)? This is just a
raw hack to illustrate a possible way to do it.

Quote:

at_least_once { any "a-z" }
literal "."
end

any %w{com edu org}
end

I like this! A readable DSL for regular expressions.

Regards,
Pit

Ari_B · December 21, 2009, 4:39pm

Phlip wrote:

rx = match('foo') or match('bar') # like /(foo|bar)/

Aside: you either need to use ‘||’ instead of ‘or’, or you need extra
parentheses, i.e.

rx = (match(‘foo’) or match(‘bar’))

otherwise it parses as

(rx = match(‘foo’)) or match(‘bar’)

I don’t really have a problem with regexps as they are. Although I’d
like to have more limited, true regexps, which compile to a DFA and
never backtrack.

Ari_B · December 20, 2009, 5:59am

Ari B. wrote:

On Aug 6, 2007, at 9:40 PM, Phlip wrote:

So start writing! and research other DSLs as you go.

Ugh. If I must (which I must). What would you suggest as syntax?

Also, should I completely try to reinvent the wheel, or create a
wrapper for current RegExp?

Man. I need a mentor on this

I would suggest taking a look at Treetop, both as an easy-to-use parser
generator and as an inspiration for regexp extensions. But I mostly
like regexps the way they are.

aRi
--------------------------------------------|
IMO, Arabic has THE most beautiful script.

Ever looked at Mongolian (Uighur) script?

Poetically, English is extremely beautiful. It’s like a language of
RegExp - except there are no rules!

Uh, what? (I know that was intended to be funny – I just don’t get
it.)

Spoken, the most beautiful language is either French (sorry) or
Esperanto.

Hmmm…

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]Â

Ari_B · December 21, 2009, 6:30pm

2009/12/21 Brian C. [email protected]:

(rx = match(‘foo’)) or match(‘bar’)

I don’t really have a problem with regexps as they are. Although I’d
like to have more limited, true regexps, which compile to a DFA and
never backtrack.

Nowadays DFA’s are rare because NFA provide more features and you can
use them to your advantage (i.e. prioritizing by ordering
alternatives). You can “switch off” backtracking by using atomic
groups and greedy quantifiers:
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Kind regards

robert

Ari_B · December 24, 2009, 3:26am

Ari B. wrote:

Just randomly curious -

Is there an alternate RegExp “language” to the current one in Ruby
and Perl?

There’s a RegExp library for Common Lisp that, besides a string, accepts
also a “parse tree”. See CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp

The regexp string…

(?:abc){3,5}

…is equal to the following data structure:

(:GREEDY-REPETITION 3 5 (:GROUP “abc”))

which, in Ruby, looks like:

[:GREEDY-REPETITION, 3, 5, [:GROUP “abc”]]

This is not really a different language, just a way to express the
regexp string as a data-structure.