String#split: unxepected result

sheynkman · November 7, 2006, 10:48am

irb(main):001:0> “aabbcc”.split(/bb/)
=> [“aa”, “cc”]
irb(main):002:0> “aabbcc”.split(/(bb)/)
=> [“aa”, “bb”, “cc”]
irb(main):003:0> “aabbcc”.split(/bb©?/)
=> [“aa”, “c”, “c”]

The last two result is unexpected for me. Can anybody explain it?

sheynkman · November 7, 2006, 10:48am

On 31.10.2006 10:49, Kirill S. wrote:

irb(main):001:0> “aabbcc”.split(/bb/)
=> [“aa”, “cc”]
irb(main):002:0> “aabbcc”.split(/(bb)/)
=> [“aa”, “bb”, “cc”]
irb(main):003:0> “aabbcc”.split(/bb©?/)
=> [“aa”, “c”, “c”]

The last two result is unexpected for me. Can anybody explain it?

#split returns matching groups of the split pattern if there are
capturing groups:

irb(main):008:0> “aabbcc”.split(/bb©?/)
=> [“aa”, “c”, “c”]
irb(main):009:0> “aabbcc”.split(/bb(?:c)?/)
=> [“aa”, “c”]
irb(main):010:0> “aabbcc”.split(/(bb(?:c)?)/)
=> [“aa”, “bbc”, “c”]

Regards

robert

sheynkman · November 7, 2006, 10:48am

Kirill S. wrote:

irb(main):001:0> “aabbcc”.split(/bb/)
=> [“aa”, “cc”]
irb(main):002:0> “aabbcc”.split(/(bb)/)
=> [“aa”, “bb”, “cc”]
irb(main):003:0> “aabbcc”.split(/bb©?/)
=> [“aa”, “c”, “c”]

The last two result is unexpected for me. Can anybody explain it?

You have a capturing group in the last RE. In this case, split also
returns capturing groups. See for instance

str = “abcd”
=> “abcd”

p str.split(/(b)/)
[“a”, “b”, “cd”]

The string is split into “a” and “cd”, and in the middle you get the
result of the capturing group, “b”. You want a non-capturing group,
(?:c) instead of ©

Cheers !

Vince

sheynkman · November 7, 2006, 10:48am

On 10/31/06, Kirill S. [email protected] wrote:

irb(main):001:0> “aabbcc”.split(/bb/)
=> [“aa”, “cc”]
irb(main):002:0> “aabbcc”.split(/(bb)/)
=> [“aa”, “bb”, “cc”]
irb(main):003:0> “aabbcc”.split(/bb(c)?/)
=> [“aa”, “c”, “c”]

The last two result is unexpected for me. Can anybody explain it?

If there are any groups in the delimiter, they are output as well.
This is not documented in RDoc, only in the PickAxe book.

It’s useful when you want to keep the delimiters.