I understand regular expressions, but can someone please explain this: re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/ By the way, this only works with the Oniguruma engine (Ruby 1.9). So, now that there is the capability to match balanced parens and so forth, does this mean that the new regular expression engine can be used to construct simple parsers (matching language constructs)?
on 19.11.2007 02:55
on 19.11.2007 14:25
On Nov 18, 8:50 pm, Wayne Magor <Wayne.Ma...@gmail.com> wrote: > I understand regular expressions, but can someone please explain this: > > re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/ > > By the way, this only works with the Oniguruma engine (Ruby 1.9). > > So, now that there is the capability to match balanced parens and so > forth, does this mean that the new regular expression engine can be > used to construct simple parsers (matching language constructs)? %r/ ... / -- regexp delimter (why they didn't just use / ... /, I don't know) ( ... ) -- non-capturing group - (normally would be capturing, but see http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt, part 10, case 3) -- seems rather useless, given that the only contained item is a capturing group (?<pg> ... ) -- capturing named group (see http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt, part 7) \( .. \) -- literal parentheses surrounding pattern (?: ... | ... | ... )* -- non-capturing group of 3 alternatives, repeated 0 or more times \\[()] -- escaped literal parens [^()] -- anything except parens \g<pg> -- match the pg-named pattern here (recursive sub-exp - see http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt, part 9)
on 20.11.2007 23:42
Thanks, I understand nearly everything now. It really shows the power of the oniguruma engine for regular expressions. By the way, the comma caused a Japanese site to come up. For people's reference the manual for onigurama is at: http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt That's a great reference, but I still didn't understand this: Noah Easterly wrote: > \\[()] > -- escaped literal parens What is that pattern? I've never seen that before. What does it match? Where can I read about that? Why would that be there since a new open paren should start another instance of <pg>, shouldn't it? So, obviously, I'm still a little confused.
on 21.11.2007 01:58
On Nov 18, 2007, at 17:55 , Wayne Magor wrote: > I understand regular expressions, but can someone please explain this: > > re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/ > > By the way, this only works with the Oniguruma engine (Ruby 1.9). > > So, now that there is the capability to match balanced parens and so > forth, does this mean that the new regular expression engine can be > used to construct simple parsers (matching language constructs)? No, translated to 1.8, the regex would be: %r/((\((?:\\[()]|[^()]|\2)*\)))/
on 21.11.2007 01:59
On Nov 20, 2007, at 14:42 , Wayne Magor wrote: > Noah Easterly wrote: >> \\[()] >> -- escaped literal parens > > What is that pattern? I've never seen that before. What does it > match? > Where can I read about that? "\(" or "\)"
on 21.11.2007 15:11
Noah Easterly wrote: > On Nov 18, 8:50 pm, Wayne Magor <Wayne.Ma...@gmail.com> wrote: >> I understand regular expressions, but can someone please explain this: >> >> re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/ snip > (?: ... | ... | ... )* > -- non-capturing group of 3 alternatives, repeated 0 or more times > \\[()] > -- escaped literal parens > [^()] > -- anything except parens > \g<pg> > -- match the pg-named pattern here Ok, so there are 3 alternatives in the non-capturing group: 1. An open or close parenthesis 2. Any character except a paren 3. A pattern that starts with an open paren Am I the only one that finds this strange?
on 03.12.2007 20:07
On Nov 21, 9:11 am, Wayne Magor <wema...@hotmail.com> wrote: > > [^()] > > -- anything except parens > > \g<pg> > > -- match the pg-named pattern here > > Ok, so there are 3 alternatives in the non-capturing group: > > 1. An open or close parenthesis correction. As Eric said above, an escaped (read, with leading backslash) parenthesis. > 2. Any character except a paren yup. > 3. A pattern that starts with an open paren AND ends in a close paren, and contains only, non-parens, escaped parens, and balanced pairs of parens. > > Am I the only one that finds this strange? Doubtful :). You may be one of the ones to which this is new, though. I find it strange that only recognize parenthesis escapes, and not escaped backslashes. So you can do something like: ( \( ) and match correctly, but there's no way to do a balanced pair of parentheses containing just a backslash: (\) -- no (\\) -- no (\\\) -- no (\ ) -- matches, but has an extra space. I would have replaced '\\[()]' by '\\[()\\]' so that '(\\)' would match.