Forum: Ruby regex extension to handle matching parens?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8344afd4bffe759d7d3ae86e933fb29e?d=identicon&s=25 ivo welch (Guest)
on 2009-01-23 23:35
(Received via mailing list)
Dear Experts:  I am very new to ruby, literally having just read the
ruby book.

I want to write a program that does basic LaTeX parsing, so I need to
match '}' closings to the opening '{'.  (yes, I understand that LaTeX
has very messy syntax, so this will only work for certain LaTeX docs.)
 Does a gem exist that facilitates closing-paren-matching fairly
painlessly?  For example,

  sample = " \caption{my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$} more here {}"

so, I want to find my "\caption" matcher ruby program to be able to
detect the closing paren, and provide me with everything in between
the opener and closer (i.e., "my table \label{table-label} example:
$\sqrt{2+\sqrt{2}}$").  Possible?

I searched this mailing list first, but I only found discussions from
years back about this issue.  I understand that this is not strictly
speaking a regular expression.  I come from a perl background.  There
are now some regex extension libraries that make it possible for the
built-in regex engine to parse matching parens
(Regexp::Common::balanced and Text::Balanced).  I was hoping I could
find some similar gem for ruby.

help appreciated.

Sincerely,

/iaw
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2009-01-24 01:17
(Received via mailing list)
I think that you need to look at what Oniguruma might be able to do.
http://oniguruma.rubyforge.org/

I believe I've seen it demonstrated that balanced open/close pairs can
be found with this regular expression engine. It might be ugly,
however, but then you probably expected that.

-Rob
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2009-01-24 01:26
(Received via mailing list)
Ah, specifically, the "Back reference with nest level" section of
http://oniguruma.rubyforge.org/oniguruma/files/Syn...
8344afd4bffe759d7d3ae86e933fb29e?d=identicon&s=25 ivowel (Guest)
on 2009-01-24 02:05
(Received via mailing list)
thank you, rob.  great reference.  now I know that it can be done.
alas, this doc is a little over my head.  can someone who has used
this construct possibly please show me how I would try it on my simple
example?

  sample = " \caption{my table \label{table-label} example: $\sqrt{2+
\sqrt{2}}$} more here {}"


accomplishing this is actually not ugly at all in perl:

  use Regexp::Common;
  my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
  /\\caption$matchingarg/;
  print "The \\caption argument is $1\n";

of course, perl is ugly in many other respects, but here, it does
nicely.

regards, /iaw
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2009-01-24 03:35
(Received via mailing list)
ivowel wrote:

> accomplishing this is actually not ugly at all in perl:
>
>   use Regexp::Common;
>   my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
>   /\\caption$matchingarg/;
>   print "The \\caption argument is $1\n";
>
> of course, perl is ugly in many other respects, but here, it does
> nicely.
>
> regards, /iaw


sample = " \\caption{my table \\label{table-label}
  example: $\\sqrt{2+\\sqrt{2}}$} more here {}"


def bal_fences str
  left = str[0,1]
  fences = /[#{Regexp.escape "(){}[]<>"[ /#{Regexp.escape left}./ ]}]/
  accum = "" ; count = 0
  str.scan( /.*?#{fences}/ ){|s|
    count += if s[-1,1] == left ; 1 else -1 end
    accum << s
    break if 0 == count
  }
  accum
end


p bal_fences( sample[ /caption(.*)/m, 1 ] )
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-24 17:40
(Received via mailing list)
On 24.01.2009 02:00, ivowel wrote:
>
>   use Regexp::Common;
>   my $matchingarg = qr/$RE{balanced}{-parens=>'{ }'})/;
>   /\\caption$matchingarg/;
>   print "The \\caption argument is $1\n";
>
> of course, perl is ugly in many other respects, but here, it does
> nicely.

Ugliness often means bad maintainability...  I'd probably use a
different approach which also works with simpler regular expressions:

# untested
Node = Struct.new :parent, :children

current = root = Node.new nil, []
tokens = input.split(%r{([](){}])})

tokens.each do |token|
   case token
     when %r{[({]}
       current = Node.new current, []
     when %r{[])}]}
       current = current.parent
     else
       current.children << token
   end
end

In other words: build a rudimentary context free parser.  Depends of
course on what you want to do.

Cheers

  robert
This topic is locked and can not be replied to.