Regex dynamic count modifier {min, max}?

Here is an idea and tell me if it could be accomplished by some other
means.

To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Using a regex sorta like so to grep the function-start-pattern(s) :

/\A((fn:[\w-]+)[ ]([ ])+\z/i

It would be nice to ensure the proper count of ‘)’, without confusion if
say
the xargs had ‘)’ literal or escaped string value(s) in there.

One way is to provide a count ref for function-start-pattern, so I could
then group a pattern for match on post-xargs ‘)’ and force the {min,max}
count by some backref to count-of-(function-start-pattern) =X and put
that
in there for the (function-end-pattern)+{X,X}.

Then it might be something like this (ignore the lack of a match on
possible
xargs for now) :

/\A((fn:[\w-]+)[ ]([ ])+[ ]*()){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there
would be
matching count-left-side-( and count-right-side-).

Or I don’t understand enuf about the internals of regex to know that
this is
impossible.

-ntcm

On Feb 7, 7:02 pm, jOhn [email protected] wrote:

/\A((fn:[\w-]+)[ ]([ ])+\z/i
xargs for now) :

/\A((fn:[\w-]+)[ ]([ ])+[ ]*()){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there would be
matching count-left-side-( and count-right-side-).

Or I don’t understand enuf about the internals of regex to know that this is
impossible.

-ntcm

Could you use the awk statement to further parse here?

2008/2/8, jOhn [email protected]:

/\A((fn:[\w-]+)[ ]([ ])+[ ]*()){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there would be
matching count-left-side-( and count-right-side-).

Or I don’t understand enuf about the internals of regex to know that this is
impossible.

Parsing nested structures is not possible with standard regular
expressions. IIRC they added something to Perl regexps to do that and
it may be possible with Ruby 1.9; but I do not know the 1.9 regexp
engine good enough to answer that off the top of my head.

So the usual approach is to use a context free grammar and parser.
You can find parser generators in the RAA.

If you just want to ensure counts match you could do something like
this:

raise “brackets do not match!” if
str.scan(/(/).size != str.scan(/)/).size

However, this does not ensure proper nesting. I bit more sophisticated:

c = 0
str.scan /[()]/ do |m|
case m
when “(”
c += 1
when “)”
c -= 1
raise “Mismatch at ‘#$`’” if c < 0
else
raise “Programming error”
end
end
raise “Mismatch” unless c == 0

But now you get pretty close to a decent parser. :slight_smile:

Kind regards

robert

To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Are you looking for something like this (ruby19):

def get_fns(string, count=0)
m = /(?
fn:[\w-]+
\s*(\s*
(\g|[^)])
\s
)
)\s*/xi.match(string)
if m
n = /^(fn:[\w-]+)(\s*)((.*?))$/.match(m[‘fn’])
puts “FN #{count}: #{n[1]} with args #{n[3]}”
get_fns(n[3], count + 1)
end
end
a = “foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar”
get_fns(a)

a = “foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar”
get_fns(a)
FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
FN 1: fn:function2 with args fn:function1(xargs)
FN 2: fn:function1 with args xargs
=> nil

Regards,
Thomas.

I modified slightly to avoid parenthesis within quotes or double quotes

def get_fns(string, count=0)
m = /(?
fn:[\w-]+
\s*(\s*
(\g|(".+")(’.+’)[^)])
\s
)
)\s*/xi.match(string)
if m
n = /^(fn:[\w-]+)(\s*)((.*?))$/.match(m[‘fn’])
puts “FN #{count}: #{n[1]} with args #{n[3]}”
get_fns(n[3], count + 1)
end
end

wow good job thomas.

(\g|(“.+”)(‘.+’)[^)]*)

In this case you’ll probably have to take care of strings like “foo
"bar"” etc.

I tend to use the regexp from my JSON-quiz solution:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/289684