Is it possible, a fully general Enumerable#recursive?

tcsnide · April 7, 2010, 11:50pm

For the last couple of days I’ve been trying to write an Enumerable
method called #recursive. Rather than create a bunch of methods like
#recursive_each, #recursive_map, #recursive_sort, etc. I figured that
it should be possible to create a single #recursive method that
returned an Enumerator, or barring that a Functor, that would handle
any enumerable method, e.g. recursive.each, recursive.map,
recursive.sort, and so on. But I have yet to figure out fully general
solution.

My nearest success thus far is:

require ‘facets/functor’

module Enumerable

def recursive
  Functor.new do |op,*a,&b|
    __send__(op) do |e|
      if self.class === e
        e.recursive.__send__(op,*a,&b)
      else
        b.call(e)
      end
    end
  end
end

end

This works for #each and #map but not #sort.

Can anyone think of a better solution?

tcsnide · April 8, 2010, 5:53am

On Wednesday 07 April 2010 04:49:15 pm Intransition wrote:

For the last couple of days I’ve been trying to write an Enumerable
method called #recursive. Rather than create a bunch of methods like
#recursive_each, #recursive_map, #recursive_sort, etc. I figured that
it should be possible to create a single #recursive method that
returned an Enumerator,

Yes!

or barring that a Functor,

What’s a Functor in this context? We already have proc objects, which is
traditionally what “functor” means…

This works for #each and #map but not #sort.

#sort isn’t a method of Enumerable, it’s a method of Array.

Can anyone think of a better solution?

Probably, but I’m not sure what it would look like. What’s your use case

how would I actually use this to, say, generate an infinite stream of
fibbonacci numbers?

I guess I don’t really see why you’d define it in terms of an enumerable
(in
that case, you probably want inject).

tcsnide · April 8, 2010, 6:03am

On Wed, Apr 7, 2010 at 10:49 PM, Intransition [email protected]
wrote:

For the last couple of days I’ve been trying to write an Enumerable
method called #recursive. Rather than create a bunch of methods like
#recursive_each, #recursive_map, #recursive_sort, etc. I figured that
it should be possible to create a single #recursive method that
returned an Enumerator, or barring that a Functor, that would handle
any enumerable method, e.g. recursive.each, recursive.map,
recursive.sort, and so on. But I have yet to figure out fully general
solution.
…
This works for #each and #map but not #sort.
Can anyone think of a better solution?

Not at the moment! (And there’s something i’m working on in which
I would quite like to have a recursive map/collect.)

But I have been thinking about it since Roger P.'s post
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/360275
and the subsequent posts on that.

One thing that’s been worrying me is the possibility of, for example,
recursive arrays or hashes, or sets of arrays and or hashes
which form a “loop”. Do you have anything in #recursive to detect that?

I had a look at the C code for Array and Hash, and found there’s
some code to deal with that possibility in Array#flatten! and flatten,
which is why they raise errors if you try to flatten a recursive array,
and there’s also some code in Array#inspect and Hash#inspect,
which deals with it. In a way it was annoying to find it, because
I think I ought to have realised before I looked at the code that
effectively #inspect for Arrays (etc) must be doing a recursive each,
and that to display […] or {…} when it finds an array or hash
contained “in itself” it must be detecting potential infinite loops.
But I didn’t realise that until I’d looked at the C code.

If you haven’t looked at the C code in detail, basically I think
that every time #flatten (or #inspect for that matter) effectively
runs an #each, it adds the object_id to a stack,
and in “deeper” "#each"es tests whether the current object_id
is already in the stack. If it is - hey, it’s a recursive object!
On the way down, it pops the “current” “each” object off the stack.
The C code seems to be using Arrays, but I did wonder whether
it might be “on average” faster if it used hashes. It rather depends
on what the likely maximum size of the “stack” is. Probably not large?

tcsnide · April 8, 2010, 8:01am

On Apr 7, 11:51 pm, David M. [email protected] wrote:

or barring that a Functor,

What’s a Functor in this context? We already have proc objects, which is
traditionally what “functor” means…

Functor is defined as:

class Functor
private(*instance_methods.select { |m| m !~ /(^__|^binding$)/ })
def initialize(&function)
@function = function
end
def to_proc
@function
end
def method_missing(op, *args, &blk)
@function.call(op, *args, &blk)
end
end

#method_missing is what makes it different from a Proc.

This works for #each and #map but not #sort.

#sort isn’t a method of Enumerable, it’s a method of Array.

It seems to be according to ri.

Can anyone think of a better solution?

Probably, but I’m not sure what it would look like. What’s your use case –
how would I actually use this to, say, generate an infinite stream of
fibbonacci numbers?

I don’t think anyone can generate that seeing that it’s infinite and
all

I guess I don’t really see why you’d define it in terms of an enumerable (in
that case, you probably want inject).

Perhaps Array is the more appropriate place for it.

tcsnide · April 9, 2010, 5:01pm

On Apr 7, 5:49 pm, Intransition [email protected] wrote:

For the last couple of days I’ve been trying to write an Enumerable
method called #recursive. Rather than create a bunch of methods like
#recursive_each, #recursive_map, #recursive_sort, etc. I figured that
it should be possible to create a single #recursive method that
returned an Enumerator, or barring that a Functor, that would handle
any enumerable method, e.g. recursive.each, recursive.map,
recursive.sort, and so on. But I have yet to figure out fully general
solution.

Working on this more I currently have the method #visit (see the code
below). It almost works, but it has this one issue that makes no sense
to me, and I wonder what is going on under the hood in Enumerator for
it do this. It has the air of a bug to me --or at least a feature
deficiency.

Notice:

[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit{ |x| x.succ }
=> [2, 3, 4, [“b”, “c”, “d”]]

But using Enumerator:

[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit.map{ |x| x.succ }
=> [2, 3, 4, “b”, “c”, “d”]

Why is it flattening the result?

Here’s the code for #visit:

module Enumerable

def visit(&block)
  if block_given?
    map do |e|
      e.visit(&block)
    end
  else
    to_enum(:visit)
  end
end

end

class Object

def visit(&block)
  block.call(self)
end

end

class String

# This is needed for Ruby 1.8
def visit(&block)
  block.call(self)
end

end

tcsnide · April 9, 2010, 8:04pm

On Thu, Apr 8, 2010 at 5:51 AM, David M. [email protected]
wrote:

On Wednesday 07 April 2010 04:49:15 pm Intransition wrote:

This works for #each and #map but not #sort.

#sort isn’t a method of Enumerable, it’s a method of Array.
ruby-1.9.1-p378 > Enumerable.instance_methods.grep /sort/
=> [:sort, :sort_by]
However you will need to define #<=> on the return value of recursive.
HTH
R.

tcsnide · April 8, 2010, 8:07am

On Apr 8, 12:02 am, Colin B. [email protected] wrote:

Not at the moment! (And there’s something i’m working on in which
I would quite like to have a recursive map/collect.)

But I have been thinking about it since Roger P.'s posthttp://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/360275
and the subsequent posts on that.

One thing that’s been worrying me is the possibility of, for example,
recursive arrays or hashes, or sets of arrays and or hashes
which form a “loop”. Do you have anything in #recursive to detect that?

I wasn’t planning on it. I figure Ruby’s internals might catch it on
their own and if not then I’d just post a warning in the docs not to
use on such objects. You’ll get a stack overflow error regardless.
But, hey, if someone has a good solution for that too then all the
better, but I fear it would be very inefficient to ensure no “loops”,
but maybe I am wrong about that.

tcsnide · April 9, 2010, 8:54pm

On Apr 9, 2:03 pm, Robert D. [email protected] wrote:

However you will need to define #<=> on the return value of recursive.
Yep. That’s trick b/c comparing and array or other enumerable to a non
enumerable raises an error. So sorting with this is probably out of
the question. On a side note, I am not so sure that raising an error
is best. Why not just assume that too non-comparable things are equal?

tcsnide · April 9, 2010, 8:50pm

On Apr 9, 11:01 am, Intransition [email protected] wrote:

But using Enumerator:

[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit.map{ |x| x.succ }
=> [2, 3, 4, “b”, “c”, “d”]

Why is it flattening the result?

To just make that much stranger:

[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit.with_index{ |x,i| i }
=> [0, 1, 2, [3, 4, 5]]

Doesn’t flatten, but somehow the index is being carried on to the
subarray iterations – that really fracks with my mind.

tcsnide · April 9, 2010, 9:03pm

On 4/9/10, Intransition [email protected] wrote:

Why is it flattening the result?

To just make that much stranger:

[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit.with_index{ |x,i| i }
=> [0, 1, 2, [3, 4, 5]]

Doesn’t flatten, but somehow the index is being carried on to the
subarray iterations – that really fracks with my mind.

This has got to be a bug. It should at least either flatten or not
flatten consistently.

On a side note, I am not so sure that raising an error
is best. Why not just assume that too non-comparable things are equal?

Maybe it would help you write a #visit that sorts properly, but in
general, I would expect that if I try to sort a collection containing
non-comparable objects it’s because I made a mistake and put the wrong
thing into the collection somehow. So I want to see that error.

tcsnide · April 10, 2010, 8:08pm

On Apr 9, 6:38 pm, Colin B. [email protected] wrote:

{
def visit(&block)
def visit(&block) ; block.call(self) ; end
#=> Enumerable#visit: result=0x1562576= [217]
#=> rr=0x1561088= [17, 217, 37]
if block_given? then result = each do |e| e.visit(&block) end
gives for it.map:

puts ; rr = it.map{ |x| x+7 }
#=> Enumerable#visit: self=0x14484864= [10, [210], 30]
#=> Enumerable#visit: self=0x14484896= [210]
#=> Enumerable#visit: result=0x14484896= [210]
#=> Enumerable#visit: result=0x14484864= [10, [210], 30]
puts “#=> rr=0x#{rr.object_id}= #{rr.inspect}”
#=> rr=0x1298112= [17, 217, 37]

Thanks, that helped me understand a little better.

After working on this for far too long, I have concluded that a highly
generalized implementation of recursion for all (or at least most)
enumerable methods not feasible. But there are still many things in
common between recursion methods, so instead I have created a Recursor
class to act something like an Enumerator for recursive operations.

Thanks for the help.

tcsnide · April 11, 2010, 2:52pm

On Sat, Apr 10, 2010 at 7:05 PM, Intransition [email protected]
wrote:

On Apr 9, 6:38 pm, Colin B. [email protected] wrote:

I’m not sure that it is a bug. …

Thanks, that helped me understand a little better.

After working on this for far too long, I have concluded that a highly
generalized implementation of recursion for all (or at least most)
enumerable methods not feasible.
http://www.askoxford.com/concise_oed/feasible?view=uk
feasible: adjective 1 possible and practical to achieve easily or
conveniently. 2 informal likely.
I think that’s the precise word: I’d concluded that it was probably
possible,
but probably too complicated, and I was having difficulties getting
things
to work with “to_enum” in a consistent way, or indeed sometime at all.
(For example, a recursive Array#map! worked, but a recursive Array#map
didn’t.)

But there are still many things in common between recursion methods,
so instead I have created a Recursor class
to act something like an Enumerator for recursive operations.
The idea of having a Recursor class hadn’t occurred to me,
and that seems (at first thoughts) quite possibly fruitful as a
solution.

tcsnide · April 11, 2010, 10:56pm

On 04/09/2010 08:53 PM, Intransition wrote:

However you will need to define #<=> on the return value of recursive.
Why that? The whole point would be to sort all elements that would be
recursively returned. The #recursive return value would never be seen
then.

Yep. That’s trick b/c comparing and array or other enumerable to a non
enumerable raises an error. So sorting with this is probably out of
the question. On a side note, I am not so sure that raising an error
is best. Why not just assume that too non-comparable things are equal?

See above.

Here are my two solutions with plain old Enumerator approach. I do not
see the benefit of a Functor here.

simple

module Enumerable
def recursive(&b)
if b
each do |e|
if Enumerable === e
e.recursive(&b)
else
b[e]
end
end
self
else
Enumerator.new(self, :recursive)
end
end
end

catch loops

module Enumerable
def recursive(items = {}, &b)
if b
each do |e|
items.fetch e.object_id do |oid|
items[oid] = e

       if Enumerable === e
         e.recursive(items, &b)
       else
         b[e]
       end
     end
   end
   self
 else
   Enumerator.new(self, :recursive)
 end

end
end

This one suffers the fragility that callers providing something else
than an empty Hash can break it. That could be fixed with a bit more
effort by using a thread local. Just using the simple approach is
probably better since there is no notification of loops anyway so the
using code has zero chance to handle the case of loops which might be
important to know.

Another note: your original code suffered from the inability to mix
different Enumerables because you use “self.class” for the type check.
I changed that by only testing for Enumerable.

Kind regards

robert

tcsnide · April 10, 2010, 12:40am

On Fri, Apr 9, 2010 at 8:02 PM, Caleb C. [email protected] wrote:

Why is it flattening the result?
To just make that much stranger:
[1, 2, 3, [‘a’, ‘b’, ‘c’] ].visit.with_index{ |x,i| i }
=> [0, 1, 2, [3, 4, 5]]
Doesn’t flatten, but somehow the index is being carried on to the
subarray iterations – that really fracks with my mind.

This has got to be a bug. It should at least either flatten or not
flatten consistently.

I’m not sure that it is a bug.

http://www.ruby-doc.org/ruby-1.9/classes/Enumerable.html#M002713
enum.map {| obj | block } => array
Returns a new array with the results of running block once for every
element in enum.

http://www.ruby-doc.org/ruby-1.9/classes/Enumerable.src/M002713.html
static VALUE
enum_collect(VALUE obj)
{
VALUE ary;
RETURN_ENUMERATOR(obj, 0, 0);
ary = rb_ary_new();
rb_block_call(obj, id_each, 0, 0, collect_i, ary);
return ary;
}

The following is Intransition’s methods with some added info.
It looks to me as though two things are happening.

Using #map inside #visit combined with using Enumerable#map
seems to (within #visit) produce the structure of the original array,
but with nil values instead of the original values.
(Changing #map inside #visit to #each gets back the original values.)
Irrespective of the internal results from #visit, Enumerable#map
collects the successive results of the block for Enumerable#map
into a new (flat) array.

module Enumerable
def visit(&block)
puts “#=> Enumerable#visit: self=0x#{self.object_id}= #{self.inspect}”
if block_given? then result = map do |e| e.visit(&block) end
else result = to_enum(:visit)
end
puts “#=> Enumerable#visit: result=0x#{result.object_id}=
#{result.inspect}”
result
end
end

class Object
def visit(&block) ; block.call(self) ; end
end

arr = [ 10, [ 210 ], 30 ]
it = arr.visit
#=> Enumerable#visit: self=0x13928400= [10, [210], 30]
#=> Enumerable#visit: result=0x13927776= #Enumerator:0x1a90ac0

puts ; rr = it.each{ |x| x+7 }
#=> Enumerable#visit: self=0x13928400= [10, [210], 30]
#=> Enumerable#visit: self=0x13928416= [210]
#=> Enumerable#visit: result=0x1562576= [217]
#=> Enumerable#visit: result=0x13926848= [17, [217], 37]
puts “#=> rr=0x#{rr.object_id}= #{rr.inspect}”
#=> rr=0x13926848= [17, [217], 37]

puts ; rr = it.map{ |x| x+7 }
#=> Enumerable#visit: self=0x13928400= [10, [210], 30]
#=> Enumerable#visit: self=0x13928416= [210]
#=> Enumerable#visit: result=0x1559360= [nil]
#=> Enumerable#visit: result=0x1559840= [nil, [nil], nil]
puts “#=> rr=0x#{rr.object_id}= #{rr.inspect}”
#=> rr=0x1561088= [17, 217, 37]

Note that the last Enumerable#visit result from it.map

has the same structure as the original nested array

(albeit with nil values - dunno why at the moment),

but that the return value of it.map is a flat array

of the successive results of the block calculation.

Changing:
if block_given? then result = map do |e| e.visit(&block) end
to:
if block_given? then result = each do |e| e.visit(&block) end
gives for it.map:

puts ; rr = it.map{ |x| x+7 }
#=> Enumerable#visit: self=0x14484864= [10, [210], 30]
#=> Enumerable#visit: self=0x14484896= [210]
#=> Enumerable#visit: result=0x14484896= [210]
#=> Enumerable#visit: result=0x14484864= [10, [210], 30]
puts “#=> rr=0x#{rr.object_id}= #{rr.inspect}”
#=> rr=0x1298112= [17, 217, 37]

tcsnide · April 12, 2010, 5:09pm

2010/4/12 Robert D. [email protected]:

On Sun, Apr 11, 2010 at 10:55 PM, Robert K.
[email protected] wrote:

On 04/09/2010 08:53 PM, Intransition wrote:

On Apr 9, 2:03 pm, Robert D. [email protected] wrote:
eed to define #<=> on the return value of recursive.

Why that? The whole point would be to sort all elements that would be
recursively returned. The #recursive return value would never be seen then.
#each takes care of structure #<=> takes care of order, IOW

As far as I understand the initial posting of this thread the whole
point is to create “something” (aka an Enumerator) whose #each method
recursively iterates through nested Enumerables. This implies that
during that iteration only elements inside Enumerables are yielded
which are not Enumerable. Hence there is no need to provide a
generalized <=> because x.recursive.sort only ever sees those non
Enumerable elements. Of course, sort only can work properly if all
elements have the same type, are compare compatible or an explicit
comparison block is passed to #sort.

#recursive is called from #each, or did I miss something here?

Technically speaking yes, but more correct would be to say that
#recursive is invoked from #recursive. (=> my implementation for an
example)

I don’t really understand why this thread seems to be so complicated.
I suspect that either I am missing something or not understanding
properly, or other contributors to this thread are missing something.

Kind regards

robert

tcsnide · April 12, 2010, 1:35pm

On Sun, Apr 11, 2010 at 10:55 PM, Robert K.
[email protected] wrote:

On 04/09/2010 08:53 PM, Intransition wrote:

On Apr 9, 2:03 pm, Robert D. [email protected] wrote:
eed to define #<=> on the return value of recursive.

Why that? The whole point would be to sort all elements that would be
recursively returned. The #recursive return value would never be seen then.
#each takes care of structure #<=> takes care of order, IOW
#recursive is called from #each, or did I miss something here?
Cheers
R.

tcsnide · April 12, 2010, 6:02pm

On Apr 12, 11:08 am, Robert K. [email protected] wrote:

I don’t really understand why this thread seems to be so complicated.
I suspect that either I am missing something or not understanding
properly, or other contributors to this thread are missing something.

I think I see what the confusion is. Your version of recursive looses
all the nested structure of the original. And so works fine for most
cases, as long that is what one wants/expects. I’ve been trying to do
it such that the nested structure stays intact. e.g.

[3,2,1,[6,5,4,[9,8,7]].recursive.sort #=> [1,2,3,[4,5,6,[7,8,9]]

Your method produces:

[3,2,1,[6,5,4,[9,8,7]].recursive.sort #=> [1,2,3,4,5,6,7,8,9]

The former proves much more difficult.

tcsnide · April 12, 2010, 5:52pm

On Apr 11, 4:55 pm, Robert K. [email protected] wrote:

However you will need to define #<=> on the return value of recursive.

Why that? The whole point would be to sort all elements that would be
recursively returned. The #recursive return value would never be seen then.

Do you run into the issue of sorting [1,2,3, [:a,:b,:c]] where by an
error is raised when it tries, 1 <=> [:a,:b,:c] ?

simple
end
self
else
Enumerator.new(self, :recursive)
end
end
end

Yes, that’s the basic definition. It does have issues though. For one,
the Enumerator isn’t very useful, as it doesn’t seem able to handle
the recursion very will (compare #with_index and #map, etc.). Another
issue is hashes, they won’t iterate well here. In Ruby 1.8.7 or less
you also get an infinite loop b/c String’s are Enumerable, so an
exception really needs to be made for them. Other than all that it
works

         e.recursive(items, &b)
end
Nice use of #fetch, btw.

This one suffers the fragility that callers providing something else
than an empty Hash can break it. That could be fixed with a bit more
effort by using a thread local. Just using the simple approach is
probably better since there is no notification of loops anyway so the
using code has zero chance to handle the case of loops which might be
important to know.

Is it important? recursive graphs seem so rare anyway, and if your
doing an recursive iteration wouldn’t an infinite loop be expected?
Just as if one were iterating over an infinite sequence?

Another note: your original code suffered from the inability to mix
different Enumerables because you use “self.class” for the type check.
I changed that by only testing for Enumerable.

I have two versions actually, one testing for Enumerable and one
testing for self.class, I think both are useful. But you are right, to
address the original question (of the previous thread on this topic)
testing for Enumerable is the one needed…

tcsnide · April 12, 2010, 7:03pm

On Apr 12, 11:08 am, Robert K. [email protected] wrote:

I don’t really understand why this thread seems to be so complicated.
I suspect that either I am missing something or not understanding
properly, or other contributors to this thread are missing something.

I find this subject extremely confusing. Colin’s answer to my last
email is still causing me puzzlement.

On 4/12/10, Intransition [email protected] wrote:

The former proves much more difficult.

Yeah, Intransition’s way is what I understood to be the point of this
thread…

Intransition, how would you handle this:
[9,8,7,[6,5,4,[3,2,1]].recursive.sort #=> [1,2,3,[4,5,6,[7,8,9]]

Clearly possible, but I have trouble seeing how you’re going to make
that happen easily.

I have often found that too much recursion is confusing. This whole
problem is going to be a lot easier if it’s structured as an external
iterator first and foremost. I have an extern iterator library
(sequence) maybe I can figure out how to do this within that. (I don’t
really have a good grip on how Enumerator works, myself…)

tcsnide · April 12, 2010, 7:38pm

On Mon, Apr 12, 2010 at 5:08 PM, Robert K.
[email protected] wrote:

#each takes care of structure #<=> takes care of order, IOW

#recursive is called from #each, or did I miss something here?

Technically speaking yes, but more correct would be to say that
#recursive is invoked from #recursive. (=> my implementation for an
example)
Ok I see what you mean now. I polluted my answer with how I would
implement the thing, but I am not here to lecture anyone, it is just
my style shining through.

I don’t really understand why this thread seems to be so complicated.
I suspect that either I am missing something or not understanding
properly, or other contributors to this thread are missing something.

IMHO because the rationale is not very clear, but I thought I could
help on one tiny point nonetheless, maybe OP is just experimenting and
finding out what (s)he wants, fair enough I’d say
Cheers
R.