I’m not sure you can match more complex examples using a regular
expression
you may be able to pull something off with lookaheads, but I think
it’d be
easier to just parse the string manually and count opened brackets:
$ cat para.rb
def para(str)
open = 0
matches = []
current = “”
str.split(/\s*/).each do |char|
if char == “)”
open -= 1
if open == 0
matches << current
current = “”
else
current << char
end
elsif char == “(”
open += 1
if open > 1
current << char
end
elsif open > 0
current << char
end
end
matches
end
I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.
Cheers,
Jesus.
Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can’t be done with a regex. (Well,
assuming you actually care about the nested parens.)
ah, “captures” - that’s the same as MatchData#to_a, right? Perfect,
Not exactly, because the MatchData#to_a returns as the first position
of the array the string that matched, and then starting from x[1] the
captured groups. MatchData#captures only contains the captures. See
the difference:
I’ve read that the .NET regex engine has some constructs to recognize
balanced constructs like parens…
It’s possible in Ruby 1.9 or Ruby 1.8 and the Oniguruma library too:
module Matchelements
def bal(lpar=’(’, rpar=’)’)
raise RegexpError,
“wrong length of left bracket ‘#{lpar}’ in bal” unless lpar.length
== 1
raise RegexpError,
“wrong length of right bracket ‘#{rpar}’ in bal” unless
rpar.length == 1
raise RegexpError,
“identical left and right bracket ‘#{lpar}’ in bal” if
lpar.eql?(rpar)
lclass, rclass = lpar, rpar
lclass = ‘\’ + lclass if lclass.match(/[-[]]/)
rclass = ‘\’ + rclass if rclass.match(/[-[]]/)
return “(?” +
“[^#{lclass}#{rclass}]?" +
“(?:\#{lpar}\g\#{rpar}” +
"[^#{lclass}#{rclass}]?” +
“)*?” +
“)”
end
end
include Matchelements
I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.
Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can’t be done with a regex. (Well,
assuming you actually care about the nested parens.)
I’ve read that the .NET regex engine has some constructs to recognize
balanced constructs like parens: