Regex extension to handle matching parens?


#1

Dear Experts: I am very new to ruby, literally having just read the
ruby book.

I want to write a program that does basic LaTeX parsing, so I need to
match ‘}’ closings to the opening ‘{’. (yes, I understand that LaTeX
has very messy syntax, so this will only work for certain LaTeX docs.)
Does a gem exist that facilitates closing-paren-matching fairly
painlessly? For example,

sample = " \caption{my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$} more here {}"

so, I want to find my “\caption” matcher ruby program to be able to
detect the closing paren, and provide me with everything in between
the opener and closer (i.e., “my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$”). Possible?

I searched this mailing list first, but I only found discussions from
years back about this issue. I understand that this is not strictly
speaking a regular expression. I come from a perl background. There
are now some regex extension libraries that make it possible for the
built-in regex engine to parse matching parens
(Regexp::Common::balanced and Text::Balanced). I was hoping I could
find some similar gem for ruby.

help appreciated.

Sincerely,

/iaw


#2

Ah, specifically, the “Back reference with nest level” section of
http://oniguruma.rubyforge.org/oniguruma/files/Syntax_txt.html


#3

thank you, rob. great reference. now I know that it can be done.
alas, this doc is a little over my head. can someone who has used
this construct possibly please show me how I would try it on my simple
example?

sample = " \caption{my table \label{table-label} example: $\sqrt{2+
\sqrt{2}}$} more here {}"

accomplishing this is actually not ugly at all in perl:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>’{ }’})/;
/\caption$matchingarg/;
print “The \caption argument is $1\n”;

of course, perl is ugly in many other respects, but here, it does
nicely.

regards, /iaw


#4

I think that you need to look at what Oniguruma might be able to do.
http://oniguruma.rubyforge.org/

I believe I’ve seen it demonstrated that balanced open/close pairs can
be found with this regular expression engine. It might be ugly,
however, but then you probably expected that.

-Rob


#5

ivowel wrote:

accomplishing this is actually not ugly at all in perl:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>’{ }’})/;
/\caption$matchingarg/;
print “The \caption argument is $1\n”;

of course, perl is ugly in many other respects, but here, it does
nicely.

regards, /iaw

sample = " \caption{my table \label{table-label}
example: $\sqrt{2+\sqrt{2}}$} more here {}"

def bal_fences str
left = str[0,1]
fences = /[#{Regexp.escape “(){}[]<>”[ /#{Regexp.escape left}./ ]}]/
accum = “” ; count = 0
str.scan( /.*?#{fences}/ ){|s|
count += if s[-1,1] == left ; 1 else -1 end
accum << s
break if 0 == count
}
accum
end

p bal_fences( sample[ /caption(.*)/m, 1 ] )


#6

On 24.01.2009 02:00, ivowel wrote:

use Regexp::Common;
my $matchingarg = qr/$RE{balanced}{-parens=>’{ }’})/;
/\caption$matchingarg/;
print “The \caption argument is $1\n”;

of course, perl is ugly in many other respects, but here, it does
nicely.

Ugliness often means bad maintainability… I’d probably use a
different approach which also works with simpler regular expressions:

untested

Node = Struct.new :parent, :children

current = root = Node.new nil, []
tokens = input.split(%r{({}])})

tokens.each do |token|
case token
when %r{[({]}
current = Node.new current, []
when %r{[])}]}
current = current.parent
else
current.children << token
end
end

In other words: build a rudimentary context free parser. Depends of
course on what you want to do.

Cheers

robert