Steve [RubyTalk] wrote:
can membership be optimized in a custom manner?
Maybe I’m about to say something silly (by not being up-to speed) but
why on earth could anyone consider the output from the above one-liner
to be sane? Not only is upper bound exceeded (under any sensible notion
of comparison) but many intermediate values are missing for example “b!”
and “a~” as well as more esoteric strings with control characters such
as “b\0” and all strings with capital letters in them?
To make matters worse, even if we accept that ranges have a peculiar
semantics over strings, they are inconsistent.
I don’t know about being sane, but the behavior is explicable by the
fact that:
Range#to_a (Enumerable#to_a) calls
Range#each, which uses
String#step (for Strings, anyway), which uses
String#upto, which calls on
String#succ (to generate successive entries)
and String#<=> (to compare against the endpoint)
The key is that String#upto stops iterating as soon as String#succ
generates either
a) a string lexically equal to the endpoint (or the endpoint’s
successor, if the range excludes the endpoint), or
b) a longer string.
(see rb_str_upto() in string.c in the Ruby source code)
It’s this second factor that’s coming into play here.
"aa".succ => "ab"
"az".succ => "ba"
...
"zy".succ => "zz"
"zz".succ => "aaa" # stop iterating here, "aaa" is longer than "zz"
The point is that even though “aa”…“b1” is a valid range, calling succ
repeatedly from the starting point will never yield the endpoint.
irb(main):001:0> (“aa”…“b1”)===“zz”
=> false
Right, because this is a check on the endpoints.
irb(main):002:0> (“aa”…“b1”).to_a.include? “zz”
=> true
Also right, because succ eventually generates ‘zz’
The max method also behaves in a very counter intuitive way without
converting to an array:
irb(main):003:0> (“A”…“d”).max
=> “Z”
Again, ‘Z’.succ is ‘AA’, so the iteration stops. ‘d’ is never reached,
and ‘Z’ is the last element generated.
Note that (‘A’…‘d’).last => ‘d’
Does anyone not claim that ranges are broken with respect to strings?
The point is that ‘A’…‘d’ is somewhat nonsensical given the default
implementation of String#succ. However, you could define String#succ in
such a way that ‘A’…‘d’ was logical.
(very) contrived example:
class String
def succ
self.downcase[0].succ.chr
end
end
('A'..'d').to_a => ["A", "b", "c", "d"]
('A'..'d').max => 'd'