Regexp: rubular VS match. Why is the result different?

spitfire · October 28, 2009, 3:12pm

I have to capture by means of regexp the content between ‘<’ and ‘>’

as instance:

str = ‘anystringanystringanystringanystring’
I need the array[‘hour’,'min,‘sec’]

I have written the regexp: /(<([^<>]+)>)+/
and I have tested it in rubular.com site (It work !)

I have run it in irb:

/(<([^<>]+)>)+/.match(‘anystringanystringanystringanystring’)
=> #<MatchData “” 1:“” 2:“hour”>

As you can see match method return just the first match in MatchData obj

Do you know why ?

thank you,
Alessandro

spitfire · October 28, 2009, 3:28pm

2009/10/28 Ale Ds [email protected]:

I have to capture by means of regexp the content between ‘<’ and ‘>’

as instance:

str = ‘anystringanystringanystringanystring’
I need the array[‘hour’,'min,‘sec’]

I have written the regexp: /(<([^<>]+)>)+/
and I have tested it in rubular.com site (It work !)

The “+” at the end is superfluous because this would match multiple
concatenated sequences like which you want as separate
items.

I have run it in irb:

/(<([^<>]+)>)+/.match(‘anystringanystringanystringanystring’)
=> #<MatchData “” 1:“” 2:“hour”>

As you can see match method return just the first match in MatchData obj

Do you know why ?

That’s the difference between #match and #scan. You want scan in your
code.

irb(main):001:0> str =
‘anystringanystringanystringanystring’
=> “anystringanystringanystringanystring”
irb(main):002:0> str.scan /<([^>]+)>/
=> [[“hour”], [“min”], [“sec”]]
irb(main):003:0> str.scan /<([^>]+)>/ do |m| p m end
[“hour”]
[“min”]
[“sec”]
=> “anystringanystringanystringanystring”
irb(main):004:0> str.scan /<([^>]+)>/ do |m,| p m end
“hour”
“min”
“sec”
=> “anystringanystringanystringanystring”

Kind regards

robert

spitfire · October 28, 2009, 3:31pm

On Oct 28, 10:12 am, Ale Ds [email protected] wrote:

I have run it in irb:>> /(<([^<>]+)>)+/.match(‘anystringanystringanystringanystring’)
Posted viahttp://www.ruby-forum.com/.
Alessandro,

You’ll want the String#scan method (RDoc Documentation
classes/String.html#M000812).

015:0> regexp = /<([^<>]+)>/
=> /<([^<>]+)>/
016:0> str = ‘anystringanystringanystringanystring’
=> “anystringanystringanystringanystring”
017:0> str.scan(regexp)
=> [[“hour”], [“min”], [“sec”]]

HTH,
Chris

spitfire · October 28, 2009, 3:46pm

The “+” at the end is superfluous because this would match multiple
concatenated sequences like which you want as separate
items.
…
I agree with you

I have run it in irb:

/(<([^<>]+)>)+/.match(‘anystringanystringanystringanystring’)
=> #<MatchData “” 1:"" 2:“hour”>

As you can see match method return just the first match in MatchData obj

Do you know why ?

That’s the difference between #match and #scan. You want scan in your
code.
…

yes, scan works !
thanks a lot,
Alessandro

spitfire · October 28, 2009, 4:01pm

You’ll want the String#scan method (RDoc Documentation
classes/String.html#M000812).
Yes, at first time I didn’t find scan method because I searched it in
Match obj (instead of String obj)

thank you
Alessandro