Forum: Ruby String#split: unxepected result

Posted by Kirill Shutemov (Guest)
on 2006-11-07 10:48
(Received via mailing list)
irb(main):001:0> "aabbcc".split(/bb/)
=> ["aa", "cc"]
irb(main):002:0> "aabbcc".split(/(bb)/)
=> ["aa", "bb", "cc"]
irb(main):003:0> "aabbcc".split(/bb(c)?/)
=> ["aa", "c", "c"]

The last two result is unexpected for me. Can anybody explain it?
Posted by Robert Klemme (Guest)
on 2006-11-07 10:48
(Received via mailing list)
On 31.10.2006 10:49, Kirill Shutemov wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
> 
> The last two result is unexpected for me. Can anybody explain it?

#split returns matching groups of the split pattern if there are
capturing groups:

irb(main):008:0> "aabbcc".split(/bb(c)?/)
=> ["aa", "c", "c"]
irb(main):009:0> "aabbcc".split(/bb(?:c)?/)
=> ["aa", "c"]
irb(main):010:0> "aabbcc".split(/(bb(?:c)?)/)
=> ["aa", "bbc", "c"]

Regards

	robert
Posted by Vincent Fourmond (Guest)
on 2006-11-07 10:48
(Received via mailing list)
Kirill Shutemov wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
> 
> The last two result is unexpected for me. Can anybody explain it?

  You have a capturing group in the last RE. In this case, split also
returns capturing groups. See for instance

>> str = "abcd"
=> "abcd"
>> p str.split(/(b)/)
["a", "b", "cd"]

  The string is split into "a" and "cd", and in the middle you get the
result of the capturing group, "b". You want a non-capturing group,
(?:c) instead of (c)

  Cheers !

	Vince
Posted by Jan Svitok (Guest)
on 2006-11-07 10:48
(Received via mailing list)
On 10/31/06, Kirill Shutemov <k.shutemov@gmail.com> wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
>
> The last two result is unexpected for me. Can anybody explain it?

If there are any groups in the delimiter, they are output as well.
This is not documented in RDoc, only in the PickAxe book.

It's useful when you want to keep the delimiters.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.