Hello,
I need to match a chunk of code like this:
…
…
#begin here
…}
…end
…}
…}
…end
…
…
I need to match from “the #begin here” up to the n-th closing token
(i.e. ‘}’ or ‘end’). n can be arbitrary (let’s consider that it is
meaningful, i.e. there are no more ‘}’ + 'end’s than n.
Example
match_stuff(2):
#begin here
…}
…end
match_stuff(4):
#begin here
…}
…end
…}
…}
etc.
What’s the most optimal way to accomplish this? I have been trying with
scan() but I did not really succeed yet
TIA,
Peter
__
http://www.rubyrailways.com
Peter S. wrote:
…}
…end
…
…
I need to match from “the #begin here” up to the n-th closing token
(i.e. ‘}’ or ‘end’). n can be arbitrary (let’s consider that it is
meaningful, i.e. there are no more ‘}’ + 'end’s than n.
n = 4
text =~ /#begin(.*(}|end)){#{n}}/m
?
(not tested).
Carlos wrote:
…end
n = 4
text =~ /#begin(.*(}|end)){#{n}}/m
Sorry, I need to ‘scan’ it. I have been playing around with similar
regexps, but they did not work out. E.g. also yours:
irb(main):007:0> text = ‘… #begin aaaa end bbb } ccc end ddd’
=> “… #begin aaaa end bbb } ccc end ddd”
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(}|end)){#{n}}/m)
=> [[" ccc end", “end”]]
does not work with scan…
Cheers,
Peter
__
http://www.rubyrailways.com
IMHO this does not work because of the greedy “.". You could try with
reluctant, i.e. ".?”. Also the grouping does not catch the whole
sequence.
Yeah, I tried to correct these problems but I am still not quite
there…
Carlos’ regexp, vol 2 (with greedy ?)
irb(main):007:0> text = ‘… #begin aaaa end bbb } ccc end ddd’
=> “… #begin aaaa end bbb } ccc end ddd”
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*?(}|end)){#{n}}/m)
=> [[" ccc end", “end”]]
And I would like to get
[[“#begin aaaa end bbb }”]]
OK, I know that I did not specify the problem correctly for the first
time, maybe now it is more clear…
Cheers,
Peter
__
http://www.rubyrailways.com
On 11.12.2006 10:37, Carlos wrote:
…end
n = 4
text =~ /#begin(.*(}|end)){#{n}}/m
?
(not tested).
IMHO this does not work because of the greedy “.". You could try with
reluctant, i.e. ".?”. Also the grouping does not catch the whole
sequence.
robert
Peter S. wrote:
#begin here
meaningful, i.e. there are no more ‘}’ + 'end’s than n.
=> “… #begin aaaa end bbb } ccc end ddd”
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(}|end)){#{n}}/m)
=> [[" ccc end", “end”]]
does not work with scan…
To make it work with scan just make the parens non-capturing:
irb(main):001:0> text = “#begin aaa end bbb } ccc } #begin ddd end eee
end fff”
=> “#begin aaa end bbb } ccc } #begin ddd end eee end fff”
irb(main):002:0> text.scan(/#begin(?:.*?(?:}|end)){2}/m)
=> ["#begin aaa end bbb }", “#begin ddd end eee end”]
Good luck.
To make it work with scan just make the parens non-capturing:
irb(main):001:0> text = “#begin aaa end bbb } ccc } #begin ddd end eee
end fff”
=> “#begin aaa end bbb } ccc } #begin ddd end eee end fff”
irb(main):002:0> text.scan(/#begin(?:.*?(?:}|end)){2}/m)
=> [“#begin aaa end bbb }”, “#begin ddd end eee end”]
Ha! That was the trick I have been looking for! Muchas Gracias, Carlos.
Cheers,
Peter
__
http://www.rubyrailways.com
Peter S. wrote:
…}
…end
This won’t solve the entire problem, but it will give you an array of
indices to matching elements:
#!/usr/bin/ruby -w
data = File.read(“testdata.txt”)
match_indices = []
data.scan(/}/) do
match_indices << Regexp.last_match.begin(0)
end
puts match_indices
You could begin by scanning to your planned start mark, then scan for
matching elements using this code. Or you could segregate the block
between
the start and end marks, then scan for matches in the substring using
this
code.