Ruby Forum Ruby > String#split: unxepected result

Posted by Kirill Shutemov (Guest)
on 07.11.2006 10:48
(Received via mailing list)
irb(main):001:0> "aabbcc".split(/bb/)
=> ["aa", "cc"]
irb(main):002:0> "aabbcc".split(/(bb)/)
=> ["aa", "bb", "cc"]
irb(main):003:0> "aabbcc".split(/bb(c)?/)
=> ["aa", "c", "c"]

The last two result is unexpected for me. Can anybody explain it?
Posted by Robert Klemme (Guest)
on 07.11.2006 10:48
(Received via mailing list)
On 31.10.2006 10:49, Kirill Shutemov wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
> 
> The last two result is unexpected for me. Can anybody explain it?

#split returns matching groups of the split pattern if there are
capturing groups:

irb(main):008:0> "aabbcc".split(/bb(c)?/)
=> ["aa", "c", "c"]
irb(main):009:0> "aabbcc".split(/bb(?:c)?/)
=> ["aa", "c"]
irb(main):010:0> "aabbcc".split(/(bb(?:c)?)/)
=> ["aa", "bbc", "c"]

Regards

	robert
Posted by Vincent Fourmond (Guest)
on 07.11.2006 10:48
(Received via mailing list)
Kirill Shutemov wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
> 
> The last two result is unexpected for me. Can anybody explain it?

  You have a capturing group in the last RE. In this case, split also
returns capturing groups. See for instance

>> str = "abcd"
=> "abcd"
>> p str.split(/(b)/)
["a", "b", "cd"]

  The string is split into "a" and "cd", and in the middle you get the
result of the capturing group, "b". You want a non-capturing group,
(?:c) instead of (c)

  Cheers !

	Vince
Posted by Jan Svitok (Guest)
on 07.11.2006 10:48
(Received via mailing list)
On 10/31/06, Kirill Shutemov <k.shutemov@gmail.com> wrote:
> irb(main):001:0> "aabbcc".split(/bb/)
> => ["aa", "cc"]
> irb(main):002:0> "aabbcc".split(/(bb)/)
> => ["aa", "bb", "cc"]
> irb(main):003:0> "aabbcc".split(/bb(c)?/)
> => ["aa", "c", "c"]
>
> The last two result is unexpected for me. Can anybody explain it?

If there are any groups in the delimiter, they are output as well.
This is not documented in RDoc, only in the PickAxe book.

It's useful when you want to keep the delimiters.