the biggest problem with regular expressions in most languages isn’t the
syntax, but rather
the inability to easily compose small REs into larger REs. Which is why
so many programs end
up with huge, unreadable REs.
You can do that in ruby rather simply:
example taken from an example earlier in this thread
name = /[a-z]+/i
host = /[a-z]+/i
tld = /com|net|org|edu/
input.scan(%r{\b#{name}@#{host}.#{tld}\b}) do |match|
puts “Found email address #{match}”
end
rather
the inability to easily compose small REs into larger REs. Which is why
so many programs end
up with huge, unreadable REs. As a small example, it’s really nice (and
obvious) to be able to say
re3 = re1 + re2
I agree with this and that’s why I have the following add-on in my
standard lib:
class Regexp
def +(other_regex)
…
end
end
Note also that you can do
re3 = /#{re1}#{re2}/ which fits my needs pretty well.
rx = match('foo') or match('bar') # like /(foo|bar)/
Aside: you either need to use ‘||’ instead of ‘or’, or you need extra
parentheses, i.e.
rx = (match(‘foo’) or match(‘bar’))
otherwise it parses as
(rx = match(‘foo’)) or match(‘bar’)
I don’t really have a problem with regexps as they are. Although I’d
like to have more limited, true regexps, which compile to a DFA and
never backtrack.
So start writing! and research other DSLs as you go.
Ugh. If I must (which I must). What would you suggest as syntax?
Also, should I completely try to reinvent the wheel, or create a
wrapper for current RegExp?
Man. I need a mentor on this
I would suggest taking a look at Treetop, both as an easy-to-use parser
generator and as an inspiration for regexp extensions. But I mostly
like regexps the way they are.
aRi
--------------------------------------------|
IMO, Arabic has THE most beautiful script.
Ever looked at Mongolian (Uighur) script?
Poetically, English is extremely beautiful. It’s like a language of
RegExp - except there are no rules!
Uh, what? (I know that was intended to be funny – I just don’t get
it.)
Spoken, the most beautiful language is either French (sorry) or
Esperanto.
I don’t really have a problem with regexps as they are. Although I’d
like to have more limited, true regexps, which compile to a DFA and
never backtrack.
Nowadays DFA’s are rare because NFA provide more features and you can
use them to your advantage (i.e. prioritizing by ordering
alternatives). You can “switch off” backtracking by using atomic
groups and greedy quantifiers: http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt