Re: string range membership

warrenbrown · November 23, 2005, 5:06pm

Matz,

For your information, member? used to iterate over
? items to check membership. But since confusion
each. Any ideas?
Ah, I see. So really, the root problem here is the assumption by
Range that (value < value.succ). And in String, this assumption does
not always hold true:

irb(main):001:0> s = ‘z’
=> “z”
irb(main):002:0> s < s.succ
=> false

Because of that, there is a huge distinction between

str_range.to_a.member?(x) (is x a member of the set of the range’s
values) and (str_range.first <= x <= str_range.last) (is x in the
range’s interval).
So, given that (at least in the case of ranges of strings) there is a
clear distinction between a value being included in the interval and a
value being included in the set, it appears that we have a real need for
two different methods. The methods Range#include? (in interval) and
Range#member? (of set) seem to be perfect candidates for these two
different functionalities. Before these two methods were merged, did
they take on these two functionalities, or were they different in some
other way?

Are there other cases where "membership" changes depending on

whether the range is viewed as a set or an interval? If not, perhaps it
would be better to address the fact that str.succ violates the (str <
str.succ) assumption. Perhaps the functionality currently in
String#succ could be moved to another method (String#increment
perhaps?), and String#succ could take on a new functionality that does
not violate (str < str.succ).

Anyway, please let me know if there is anything I can do to help

settle this issue.

Warren B.

warrenbrown · November 24, 2005, 1:23am

Hi,

In message “Re: [BUG] string range membership”
on Thu, 24 Nov 2005 01:03:19 +0900, “Warren B.”
[email protected] writes:

|So, given that (at least in the case of ranges of strings) there is a
|clear distinction between a value being included in the interval and a
|value being included in the set, it appears that we have a real need for
|two different methods. The methods Range#include? (in interval) and
|Range#member? (of set) seem to be perfect candidates for these two
|different functionalities. Before these two methods were merged, did
|they take on these two functionalities, or were they different in some
|other way?

#include? used for range check, #member? was for set membership. But
since they have same functionality in Enumerable, some claimed having
different behaviors in Range was confusing. I agreed.

| Anyway, please let me know if there is anything I can do to help
|settle this issue.

All we need is making up good names for each functionality.

						matz.

warrenbrown · November 24, 2005, 1:39am

On Thu, 24 Nov 2005, Yukihiro M. wrote:

|different functionalities. Before these two methods were merged, did
All we need is making up good names for each functionality.
Range#contains?

??

-a

warrenbrown · November 24, 2005, 2:19am

Hi,

In message “Re: [BUG] string range membership”
on Thu, 24 Nov 2005 09:38:11 +0900, “Ara.T.Howard”
[email protected] writes:

For which functionality?

						matz.

warrenbrown · November 24, 2005, 3:07am

On Thu, 24 Nov 2005, Yukihiro M. wrote:

For which functionality?

well, i would think of #member? as most natural for set membership - so
#contains? would/should be most like #include? - in my mind.

harp:~ > cat a.rb
module Enumerable
def contains? value
map.include? value
end
end

r = “a” … “aa”
p r.contains?(“z”)

harp:~ > ruby a.rb
true

so, if each would ‘hit’ it - it’s contained.

kind regards.

-a

warrenbrown · November 24, 2005, 3:11am

All we need is making up good names for each functionality.

That is NOT all you need! This does not solve the complete problem, but
only provides a little-bitty patch for query on a Range member, and a
very inefficient one at that --which I thought was part of the reason
you changed #include and #member to be the same in the first place.

The overarching issue is that sortable and comparable are using the
same method #<=>, but they do not neccessarily want the same meaning.
You should provide a separate method for comparable --like I said, in
most cases they will be equivalent, but not so in String. And
dictionary order comparion is needed anyway. I studied this issue
exahustively over a year ago when I wrote a true Interval class.

T.

warrenbrown · November 24, 2005, 4:44am

Hi,

In message “Re: string range membership”
on Thu, 24 Nov 2005 11:07:26 +0900, “Trans” [email protected]
writes:

|> All we need is making up good names for each functionality.
|
|That is NOT all you need! This does not solve the complete problem, but
|only provides a little-bitty patch for query on a Range member, and a
|very inefficient one at that --which I thought was part of the reason
|you changed #include and #member to be the same in the first place.

Depends on how you define problem.

|The overarching issue is that sortable and comparable are using the
|same method #<=>, but they do not neccessarily want the same meaning.
|You should provide a separate method for comparable --like I said, in
|most cases they will be equivalent, but not so in String. And
|dictionary order comparion is needed anyway. I studied this issue
|exahustively over a year ago when I wrote a true Interval class.

I’m not sure what you meant here. Range has no relation with
sorting. Can you elaborate?

						matz.

warrenbrown · November 24, 2005, 7:21am

I’m not sure what you meant here. Range has no relation with
sorting. Can you elaborate?

#succ defines a sort order of sorts (pun intended ;-). But #<=> defines
a sort order too along with comparability. In most classes there’s no
problem, but in String the two come into conflict --the orders are not
the same.

Then consider that Range is not a true interval because it uses #succ.
This is why I created a true Interval class that uses #+ instead.
Likewise Range shouldn’t use #<=> either, but another method, lets call
it #cmp. This would fix the problem.

In general:

module Comparable
def cmp(o)
self<=>o
end
end

That is to say, for anything comparable #cmp is the same as #<=>,
unless otherwise defined. (Alternately you could define #cmp as an
alias of #<=> directly in the classes it is needed --that would
probably be better.) Then in String define #cmp specially to confom to
the successive order as defined by #succ.

Thus having Range use #cmp instead of #<=> the issue is solved.

In summary, an object would then be “Rangeable” if it supports #succ,
but only fully so if is also supports #cmp too (instead of #<=>).

Does it make sense now? (Sorry if I’m not explaining well, it’s a tad
subtle and it’s been awhile since I worked on it too, so I have been
trying to recall it all myself too).

T.

warrenbrown · November 24, 2005, 8:36pm

Ara.T.Howard wrote:

On Thu, 24 Nov 2005, Yukihiro M. wrote:

All we need is making up good names for each functionality.

Range#contains?

??

I’m sure there was an earlier post with an excellent
synopsis on ranges which stated that a Range /doesn’t/
“contain”? Yeh, here it is … from you ;))

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/167200

“not quite - we have a string range that produces
27 elements. it does not ‘have’ or ‘contain’ them.
it merely suggests this set as it’s current thought”

(SCNR

Would something like Range#covers? be more apt?
(meaning within the bounds).
Oh, pooh, that’s got an ‘s’ on the end as well

daz

warrenbrown · November 24, 2005, 9:25pm

On Fri, 25 Nov 2005, daz wrote:

it merely suggests this set as it’s current thought"

(SCNR

Would something like Range#covers? be more apt?
(meaning within the bounds).
Oh, pooh, that’s got an ‘s’ on the end as well

lol. i realized that actually - i thought that the confusion with
“include?”
being used to test “containment” might be resolved by having a method
actually
named “contains?”.

too confusing?

-a

warrenbrown · November 25, 2005, 3:50pm

Jim W. wrote

How about “within?” for a value within a given range?

(0…5).within?(3)

reads backwards, IMO compared to:

(0…5).include?(3)

daz

warrenbrown · November 25, 2005, 11:51pm

daz wrote:

reads backwards, IMO compared to:

(0…5).include?(3)

Yes, I agree. I think, just finding a new word for a method that does
something, that used to have different names in the past, won’t help.
This has been tried, but it didn’t work too well.

Ruby’s ranges have (at least) a dual nature:

as an interval (a, b) of values,
as a shortcut for a set of values { a, a.succ, a.succ.succ, …, b }.

I think “include?” is a good name for the 1. And 2 is very similar to 1,
so people will easily confuse those names.

What about using a bit of double dispatch here, like that:

class Object
def element?®
r.find { |x| x == self } ? true : false
end
end

“bb”.element? “a”…“zz” # => true

This doesn’t read backwards, and the name conveys the meaning of set
membership, as required by 2.

Perhaps using another method than “find” for searching (that only
defaults to “find”) would make it possible, to provide an alternative
implementation for datastructures, that can compute membership faster
than O(n).

warrenbrown · November 25, 2005, 2:21pm

ara.t.howard wrote:

lol. i realized that actually - i thought that the confusion with
“include?”
being used to test “containment” might be resolved by having a method
actually
named “contains?”.

How about “within?” for a value within a given range?

– Jim W.

warrenbrown · November 26, 2005, 10:53pm

On 11/25/05, Florian F. [email protected] wrote:

def element?(r)
defaults to “find”) would make it possible, to provide an alternative
implementation for datastructures, that can compute membership faster
than O(n).

–
Florian F.

I was going to suggest r.has_element?(x) for the equavilent of #member.
maybe r.surrounds?(x) for for #include. That one is not as good.

warrenbrown · November 26, 2005, 11:25pm

Florian, your double dispatch is interseting. While I still have no
idea if anyone has understand the #cmp solution I’ve proposed since no
one has commented on it. A fully general solution of #cmp looks
something like this:

def cmp( other )
return 0 if self == other
loop
before, after = other.succ, self.succ
return -1 if before == self
return 1 if after == self
end
end

Of course no one would never use this becuase ‘other’ may not be an
actual member and thus never hit on ==. So the only way to ensure
member comparsion in a fully general way is to have the Range on hand
–hence your double dispatch. A generalized solution would then be:

def cmp( other, range )
return 0 if self == other
arr = range.to_a
arr.index( self ) <=> arr.index( other )
end

But this is silly since Range can do this itself, no need to double
dispatch --if #cmp is not defined on the object, Range can always
expand into an array and compare indexes itself. But there’s still the
rick of infinite expansions.

Likewise I think the double dispatching within an #element? method is
in the same league. If a #cmp can’t be defined and used to determine
membership neither will an #element? method be able to, so Range then
must resort to ‘to_a.include?’

Range is better off depending on a comparision method just for it to
ensure compatibility with #succ --which also ensures determination of
memebership with the methods we already have #member? and #include?
–Then they would do exaclty what the documentation says they’re
supposed to do, which they actually DO NOT do at the moment.

T.

warrenbrown · November 28, 2005, 6:56pm

Bob S. wrote:

How about something like Enumerable#produces?, or Enumerable#yields?

Then perhaps start deprecating Enumerable#member?

So for a range, one could use r.include?(obj) to test for obj between
the endpoints, and r.yields?(obj) to test whether r.succ ever yields obj.

Enumerable#=== becomes an issue (case statement), right?

A nice alternative bit of thinking, Bob. At least you are making some
sense.

As for the rest of the gibberish being posted here, which btw has been
the same flap for years, forget it. It’s hopeless. You all will be
right back to were you were two years ago, two years from now.

Adios,
T.

warrenbrown · November 28, 2005, 5:18pm

Yukihiro M. wrote:

#include? used for range check, #member? was for set membership. But
since they have same functionality in Enumerable, some claimed having
different behaviors in Range was confusing. I agreed.

| Anyway, please let me know if there is anything I can do to help
|settle this issue.

All we need is making up good names for each functionality.

How about something like Enumerable#produces?, or Enumerable#yields?

Then perhaps start deprecating Enumerable#member?

So for a range, one could use r.include?(obj) to test for obj between
the endpoints, and r.yields?(obj) to test whether r.succ ever yields
obj.

Enumerable#=== becomes an issue (case statement), right?