In Czech, c followed by h is considered (for sorting etc.) as one
character/grapheme ch. I need to split string to single characters with
respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
and it works fine.
Unfortunately, Ruby does not implement/support this “zero-width positive
look-behind assertion”, so the question is how can one efficiently split
the string in Ruby?
Unfortunately, Ruby does not implement/support this “zero-width
positive look-behind assertion”, so the question is how can one
efficiently split the string in Ruby?
In Czech, c followed by h is considered (for sorting etc.) as one
character/grapheme ch. I need to split string to single characters
with respect to this absurd manner.
Yes, the use of scan strikes me in the meantime too. Why (?:)?
str.scan(/ch|./i) does exactly the same, doesn’t it?
Yeah, there’s no need for the (?: … ). I started off thinking it was
more complicated than it was, and forgot to take that out. I really
need a regexp refactoring tool.
Unfortunately, Ruby does not implement/support this “zero-width
positive look-behind assertion”, so the question is how can one
efficiently split the string in Ruby?
Stupid question. One should not insist on word-for-word translation
when rewriting some code from Perl to Ruby.
Scan version is slightly better as it never returns the empty string. Of
course, thanks anyway.
But where can one find this feature of the split in the documentation? http://www.rubycentral.com/ref/ref_c_string.html#split does not mention
split returns not only delimited substrings, but also successful groups
from the match of the regexp.
I started with str.split(/[.]|@/), but then I’d lose where the @ went.
I tried turning it into
[“one-and”, “.”, “two”, “@”, “three”, “.”, “net”]
so I could .reverse that, but without positive look-behind, I couldn’t
find any way to detect the break after the dot except with \w, which
would also trigger after the hyphen.
After hours of work, I ended up with something that was not only long
and confusing, involving .collect and an inner search loop and other
stuff, but when I brought it back up to check it for this email
message, I discovered that it didn’t even actually work correctly.
And all along, all I needed to do was change
str.split(/[.]|@).reverse.join
into
str.split(/([.]|@)/).reverse.join
But where can one find this feature of the split in the
documentation? http://www.rubycentral.com/ref/
ref_c_string.html#split does not mention split returns not only
delimited substrings, but also successful groups from the match of
the regexp.
In Dave T.’ Pickaxe book. Under String#split he writes:
“If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern
matches a zero-length string, str is split into individual
characters. If pattern includes
groups, these groups will be included in the returned values.”