Dave, > Relative new Ruby user. Welcome to Ruby! > Let's see if I've got this straight. Somebody > complained because > > ('1'..'10').member?('2') > => false That is the tip of the iceberg, yes. > Good! The fact that Ruby will get incredibly clever > with strings and fabricate arbitrary sequences with > them is a charming trick, but they are arbitrary, and > it is a trick. Why is this good? The Range '1'..'10' is a member of Enumerable. As such, it has a finite number of elements and those elements can enumerated (for lack of a better word) one at a time using #each. In particular, this Range is enumerated as '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'. The method Enumerable#member?, will return true if one of the enumerated elements is equal to the parameter. However, for Ranges, the behavior of #member? is different. So different in fact that for this Range, #mamber?('2') returns false. Many people see this as a bad thing, not a good thing. > The fact that '1', '2', ... '9','10' is obvious > doesn't make it any less arbitrary. However that sequence isn't arbitrary, all others are. A Range can be defined on any object that supports #succ and #<=>. The #succ method defines the *one and only* sequence that a Range cares about, in relation to Enumerable. For strings, '1', '2', '3', '4', '5', '6', '7', '8', '9', '10' is the *one and only* sequence that #succ generates, and that is the sequence that Enumerable#member? would use if Range didn't override the #member? method. > '1'..'100' > > Is that supposed to be 1, 2, 3, ... 99, 100 or 1, 10, > 11, 100? Ruby arbitrarily decided to interpret those > strings as base 10 integers. No, it's supposed to be '1', '2', '3', ... '99', '100'. There is nothing arbitrary about the sequence, and it's not a trick. It is the sequence defined by String#succ. You are welcome to write your own version of String#succ, but it won't change anything. Range#member? will still ignore it. > 'a.1'..'c.3' > > Quite honestly, I have absolutely no idea how Ruby > would count that. Will I get 'a.1', 'a.2', 'a.3', > 'b.1' ... or is it going to go all the way to 'a.9' > and then start over with 'b.1'? Well, let's see: >ruby -e "p ('a.1'..'c.3').to_a" ["a.1", "a.2", "a.3", "a.4", "a.5", "a.6", "a.7", "a.8", "a.9", "b.0", "b.1", "b.2", "b.3", "b.4", "b.5", "b.6", "b.7", "b.8", "b.9", "c.0", "c.1", "c.2", "c.3"] > (The rest of the message was less relevant, so I snipped it along with my irrelevant smart-ass replies :o) Instead, I present the current state of affairs on this issue, as there seems to be a lot of confusion about it: There are two core issues involved in this problem. The first is the dual nature of Ranges. Since Ranges implement the #each method, they can be viewed as a set of elements, which is how Enumerable views them. Therefore (1..10).to_a works, along with all of the other wonderful methods that Enumerable provides. Ranges can also be viewed as intervals. The best example here is (1.0..10.0). This Range is *not* Enumerable, since Float does not (and can not) implement the #each method. However, it is still useful to ask if a number falls within the boundaries of a Range. Therefore, the <=> operator is used to test for Range.begin <= value <= Range.end. This is the functionality that is currently implemented by Range#member?, and its alias, Range#include?. This was mainly done as an optimization, since checking 1 <= x <= 1000000 is a whole lot faster than Enumerating all 1000000 elements. It also allowed Float Ranges to work as well. The other core issue is that the method String#succ is implemented in such a way that it is possible for (x > x.succ) to be true (e.g. 'z' > 'z'.succ). This is what makes the view of a Range as a set and the view of a Range as an interval incompatible, and why ('1'..'10').include?('2') can be viewed as either right or wrong depending on how you are looking at the Range. Certainly, '2' is in the set ('1', '2', '3', ... , '10'), but '1' <= '2' <= '10' is *not* true since strings are compared, well, as strings. So, we are currently in a situation where Enumerable.member? (and its alias Enumerable.include?) test for set membership by enumerating the set through the #each method, but Range#member? and Range#include? test for interval coverage and *not* set membership. This is the main inconsistency that we are trying to get rid of. Matz is currently considering changing the functionality of Range#member? from an interval coverage test back to the set membership test, which interestingly enough, is actually how it started life (it was later change to be the same as #include?). Range would still override the method and optimize the test for Integer Ranges, but non-Integer ranges (include String Ranges) would revert back to the Enumerable#member? method (or at least that method's functionality). Matz hasn't decided whether he would change the Range#include? method to be a test for set membership too, or to leave it as an interval coverage test. My guess is that it will remain an alias for #member?, since the two are aliases in Enumerable. However, since Range#member? would no longer be an interval coverage test, Matz would want to add a new method to Range to take its place, so he is currently trying to find a good name for that method. Current suggestions for the name include (no pun intended): #between? #betwixt? @bound? #cover? #enclose? #encompass? #in? #in_interval? #in_range? #inside? #surround? #within? Matz is also seeking comments from other people on these suggested names along with any other names that might be appropriate. David A. Black also suggested (along with the wonderfully apt name #encompass?) that this new function could also accept a Range as the parameter and test for interval over interval coverage as well. This sounds like a great suggestion and would make the new function even more useful. So, that's where we are. I hope this clears up a lot of the misconceptions that seem to have plagued this discussion. - Warren Brown
on 2005-12-01 08:01
on 2005-12-01 11:56
On Nov 30, 2005, at 22:59, Warren Brown wrote: > > No, it's supposed to be '1', '2', '3', ... '99', '100'. There is > nothing arbitrary about the sequence, and it's not a trick. I don't think you understand my use of "arbitrary." Ordering strings to correspond to their integers is absolutely arbitrary. As arbitrary as ordering them by ASCII value, or (as libraries do it) by the spelling of their pronounced forms. Integers have an inherent order. Strings do not. > Matz is also seeking comments from other people on these suggested > names along with any other names that might be appropriate. Yes. And my comment is that their very existence is detrimental to the language, promoting obscurity and the opportunity for confusion, and they should be scrapped, at least from the core. Ranges aren't arrays or integers; no small part of the problem stems from wanting to treat them as if they are. Oh, well, I don't have to use them whatever they're called.
on 2005-12-01 15:08
Nice summary Warren. There's still a little bit more to it though. If one serachs ruby-talk one finds that there are also other less obvious pacularities about Range --not that they are all the significant but they are there. The problem I see is that if #member? goes back to being essentially equivalent to #to_a.include? We're right back to the original problem exactly as you point out: > This was mainly done as an optimization, > since checking 1 <= x <= 1000000 is a whole lot faster than Enumerating > all 1000000 elements. It also allowed Float Ranges to work as well. How could one optimize a _cutstom_ memebership for a Range then? You can't, so our choices for #member? trap us between inconsistant functionaity or significant ineffeicency. And it still does not address the underlying causes: #succ and #<=> are incompatabile in the String class, and might also be so for other classes. I've offered the best solution generally possible for this issue: It corrects the underlying cuase, fixes the inconsistant functionality and maintains efficiency. What more can one ask? Nonetheless no one seems interested in it. I tend to think the reason is becuase it introduces a new method (#cmp), but since no one has even touched on it, how do I know? I'm at a loss. Do people just not get it? Did I not explain it well enough? Did I miss something? Or is that people just prefer to stew around in their own preconceptions? T.