mortee wrote:
Is this expected behaviour? I haven’t seen anything related to this
mentioned in the API docs…
irb(main):057:0> s = ‘a:::c::::d’
=> “a:::c::::d”
irb(main):058:0> s.split(/:/)
=> [“a”, “”, “b”, “”, “”, “c”, “”, “”, “”, “d”] => OK
irb(main):059:0> s.split(/:+/)
=> [“a”, “b”, “c”, “d”] => OK
irb(main):060:0> s.split(/(:)+/)
=> [“a”, “:”, “b”, “:”, “c”, “:”, “d”] => ?
irb(main):061:0> s.split(/((:)+)/)
=> [“a”, “::”, “:”, “b”, “:::”, “:”, “c”, “::::”, “:”, “d”] => ???
irb(main):062:0> s.split(/(:+)/)
=> [“a”, “::”, “b”, “:::”, “c”, “::::”, “d”] => ???
I guess I should mention that the rule I jotted down in the margin of my
book is: if the split() pattern has parenthesized sub groupings, the
result array will include the match for each subgroup–but not the whole
match.
Applying that rule to your examples:
irb(main):060:0> s.split(/(:)+/)
=> [“a”, “:”, “b”, “:”, “c”, “:”, “d”] => ?
The subgroup ( matches a single colon, so those matches are included
in the results,
irb(main):061:0> s.split(/((:)+)/)
=> [“a”, “::”, “:”, “b”, “:::”, “:”, “c”, “::::”, “:”, “d”] => ???
The subgroup ( matches one colon and those results are included. The
subgroup ((:)+) matches two, three, and four colons as it traverses the
strings and those results are included. Because groups are numbered by
their left most parentheses, the outer grouping comes first in the list.
irb(main):062:0> s.split(/(:+)/)
=> [“a”, “::”, “b”, “:::”, “c”, “::::”, “d”] => ???
The subgroup (:+) matches two, three, and four colons as it traverses
the list, and those matches are included in the results.
And, here is an example of my own that shows that the whole match is not
included in the results–only the parenthesized sub groupings are
included:
str = 'a_::b:::c::::d’
pattern = /(:+)_/
results = str.split(pattern)
p results
–output:–
[“a”, “::”, “b”, “:::”, “c”, “::::”, “d”]