Regexp question - look for parentheses then remove them

cmaxvv · October 8, 2007, 2:01pm

I’m struggling with a regular expression problem, can anyone help?

I want to take a string, look for anything in parentheses, and if i find
anything, put it into an array, minus the parentheses.

currently i’m doing this:

parentheses = /(.*)/
array = string.scan(parentheses)

This gives me eg

“3 * (1 + 2)” => ["(1 + 2)"]

but is there an easy way to strip the parentheses off before putting
it into the array?

eg
“3 * (1 + 2)” => [“1 + 2”]

In addition, if i have nested parentheses inside the outer parentheses,
i want to keep them, eg

“3 * (1 + (4 / 2))” => [“1 + (4 / 2)”]

can anyone show me how to do this?

thanks
max

cmaxvv · October 8, 2007, 2:27pm

On 10/8/07, Max W. [email protected] wrote:

This gives me eg
i want to keep them, eg

“3 * (1 + (4 / 2))” => [“1 + (4 / 2)”]

can anyone show me how to do this?

x = “3 * (1 + 2)”.match(/((.))/)
x.captures
=> [“1 + 2”]
x = “3 * (2 + (1 + 3))”.match(/((.))/)
x.captures
=> [“2 + (1 + 3)”]

Hope this helps,

Jesus.

cmaxvv · October 8, 2007, 2:39pm

JesÃºs Gabriel y GalÃ¡n wrote:

On 10/8/07, Max W. [email protected] wrote:

This gives me eg
i want to keep them, eg

“3 * (1 + (4 / 2))” => [“1 + (4 / 2)”]

can anyone show me how to do this?

x = “3 * (1 + 2)”.match(/((.))/)
x.captures
=> [“1 + 2”]
x = “3 * (2 + (1 + 3))”.match(/((.))/)
x.captures
=> [“2 + (1 + 3)”]

Hope this helps,

Jesus.

ah, “captures” - that’s the same as MatchData#to_a, right? Perfect,
thanks!

cmaxvv · October 8, 2007, 2:55pm

and if i find

Hope this helps,

Jesus.

That can fail if you have more than one bracket pair on the lowest
level:

irb(main):002:0> “3 * (2 + (1 + 3)) + (1 * 4)”.match(/((.*))/).to_a
=> ["(2 + (1 + 3)) + (1 * 4)", “2 + (1 + 3)) + (1 * 4”]

I’m not sure you can match more complex examples using a regular
expression

you may be able to pull something off with lookaheads, but I think
it’d be
easier to just parse the string manually and count opened brackets:

$ cat para.rb
def para(str)
open = 0
matches = []
current = “”
str.split(/\s*/).each do |char|
if char == “)”
open -= 1
if open == 0
matches << current
current = “”
else
current << char
end
elsif char == “(”
open += 1
if open > 1
current << char
end
elsif open > 0
current << char
end
end
matches
end

$ irb
irb(main):001:0> require ‘para’
=> true
irb(main):002:0> para(“1+2”)
=> []
irb(main):003:0> para("(1+2)")
=> [“1+2”]
irb(main):004:0> para("(1+2)3")
=> [“1+2”]
irb(main):005:0> para("((1+2)3)")
=> ["(1+2)3"]
irb(main):006:0> para("((1+2)3)+(56)")
=> ["(1+2)3", "56"]
irb(main):007:0> para("((1+2)3)+(56(1-3(1-4)))")
=> ["(1+2)3", "56(1-3*(1-4))"]
irb(main):008:0>

There are probably far more elegant ways.

HTH,

Felix

cmaxvv · October 8, 2007, 3:04pm

On 10/8/07, Felix W. [email protected] wrote:

x = “3 * (2 + (1 + 3))”.match(/((.*))/)
x.captures
=> [“2 + (1 + 3)”]

That can fail if you have more than one bracket pair on the lowest level:

irb(main):002:0> “3 * (2 + (1 + 3)) + (1 * 4)”.match(/((.*))/).to_a
=> [“(2 + (1 + 3)) + (1 * 4)”, “2 + (1 + 3)) + (1 * 4”]

True, what would be the expected result for this?

[“2 + (1 + 3)”, “1 * 4”] ???

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Cheers,

Jesus.

cmaxvv · October 9, 2007, 5:59am

[snip]

??

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Cheers,

Jesus.

Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can’t be done with a regex. (Well,
assuming you actually care about the nested parens.)

cmaxvv · October 8, 2007, 3:00pm

On 10/8/07, Max W. [email protected] wrote:

x.captures
=> [“1 + 2”]
x = “3 * (2 + (1 + 3))”.match(/((.*))/)
x.captures
=> [“2 + (1 + 3)”]

Hope this helps,

Jesus.

ah, “captures” - that’s the same as MatchData#to_a, right? Perfect,

Not exactly, because the MatchData#to_a returns as the first position
of the array the string that matched, and then starting from x[1] the
captured groups. MatchData#captures only contains the captures. See
the difference:

irb(main):001:0> a = “123456”.match(/(.)(.)\d\d/)
=> #MatchData:0xb7c97a04
irb(main):002:0> a.to_a
=> [“1234”, “1”, “2”]
irb(main):003:0> a.captures
=> [“1”, “2”]

Jesus.

cmaxvv · October 9, 2007, 9:30am

JesÃºs Gabriel y GalÃ¡n wrote:

I’ve read that the .NET regex engine has some constructs to recognize
balanced constructs like parens…

It’s possible in Ruby 1.9 or Ruby 1.8 and the Oniguruma library too:

module Matchelements
def bal(lpar=’(’, rpar=’)’)
raise RegexpError,
“wrong length of left bracket ‘#{lpar}’ in bal” unless lpar.length
== 1
raise RegexpError,
“wrong length of right bracket ‘#{rpar}’ in bal” unless
rpar.length == 1
raise RegexpError,
“identical left and right bracket ‘#{lpar}’ in bal” if
lpar.eql?(rpar)
lclass, rclass = lpar, rpar
lclass = ‘\’ + lclass if lclass.match(/[-[]]/)
rclass = ‘\’ + rclass if rclass.match(/[-[]]/)
return “(?” +
“[^#{lclass}#{rclass}]?" +
“(?:\#{lpar}\g\#{rpar}” +
"[^#{lclass}#{rclass}]?” +
“)*?” +
“)”
end
end
include Matchelements

result = “3 * (2 + (1 + 3)) + (1 * 4)”.scan(/(#{bal()})/)

p result # => [[“2 + (1 + 3)”], [“1 * 4”]]

Wolfgang NÃ¡dasi-Donner

cmaxvv · October 9, 2007, 9:30am

ah, “captures”

You can access the match data right away:

x = /((.*))/.match(“3 * (1 + 2)”)
x[1]
or $1

I’d also make the * non-greedy -> *?

/((.*?))/.match(“3 * (1 + 2) * (3 + 4)”)[1]
=> “1 + 2”

but:
/((.*))/.match(“3 * (1 + 2) * (3 + 4)”)[1]
=> “1 + 2) * (3 + 4”

cmaxvv · October 9, 2007, 9:12am

On 10/9/07, Michael Bevilacqua-Linn [email protected]
wrote:

[snip]

??

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can’t be done with a regex. (Well,
assuming you actually care about the nested parens.)

I’ve read that the .NET regex engine has some constructs to recognize
balanced constructs like parens:

http://puzzleware.net/blogs/archive/2005/08/13/22.aspx

Interesting !!

Jesus.

cmaxvv · October 9, 2007, 4:29pm

tho_mica_l wrote:

I’d also make the * non-greedy -> *?

/((.*?))/.match(“3 * (1 + 2) * (3 + 4)”)[1]
=> “1 + 2”

but:
/((.*))/.match(“3 * (1 + 2) * (3 + 4)”)[1]
=> “1 + 2) * (3 + 4”

Excellent tip, cheers!

cmaxvv · October 9, 2007, 4:16pm

That’s neat! (Or a sacrilege, I guess, depending on how you look at it
:-))

MBL