Forum: Ruby ruby1.9: lazy versions of Enumerator#select and friends?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 17:42
I've been having a play with Enumerators in ruby 1.9, in particular
adding 'lazy' versions of those methods in Enumerable which normally
return arrays. Here's a proof-of-concept implementation:

module Enumerable
  def lmap(&blk)
    Enumerator.new do |y|
      each do |e|
        y << blk[e]
      end
    end
  end

  def lselect(&blk)
    Enumerator.new do |y|
      each do |e|
        y << e if blk[e]
      end
    end
  end

  def ltake(n)
    Enumerator.new do |y|
      count = 0
      each do |e|
        break if n <= count
        y << e
        count += 1
      end
    end
  end

  def lskip(n)
    Enumerator.new do |y|
      count = 0
      each do |e|
        y << e unless count <= n
        count += 1
      end
    end
  end
end

if __FILE__ == $0
  big = (1..1_000_000_000_000)
  big.lselect { |i| i % 2 == 1 }.lmap { |i| i + 100 }.
      lskip(5).ltake(10).each { |i| puts i }
end

So instead of generating an array containing all processed elements,
they return an Enumerator which processes the elements on demand. As a
result, you can easily chain them together, as shown in the example; you
can handle large sequences without creating large intermediate arrays;
and also handle infinite lists.

Ruby hides the internals of this very nicely, I presume using Fibers to
maintain the state inside each Enumerator.

My first question is, does ruby 1.9 have methods like this already, and
I've just overlooked them? If so, where should I be looking?

If not, then is there value in adding something like this to the
language? For example,

1. Put this stuff into an external library, Facets-like, with new
methods in Enumerable as above (the method names given are just
examples, there are probably better ones)

2. Redefine Enumerator#map, Enumerator#select etc, so that they return
Enumerators instead of arrays. It seems reasonable to me that once you
have an Enumerator at the base of the chain, you will likely want to
keep chaining them together. You can always collapse the result to a
real array using 'to_a'.

You can always forcibly create an Enumerator using 'each' without a
block, e.g.

   (1..10).map { ... }.select { ... }        # normal evaluation
   (1..10).each.map { ... }.select { ... }   # this would be 'lazy'

3. A more radical language change would be to have Enumerable#map,
#select etc return Enumerators always. Some existing code would need a
'to_a' stuck on the end. Also, it may not be as efficient:

   a1 = (1..10).map { |i| i*2 }        # map generates an array
immediately
   a2 = (1..10).lmap { |i| i*2 }.to_a  # Fibers are involved

4. If you accept that we should have Enumerators rather than Arrays as
intermediate elements in the chain, then I guess we have to consider
composing Enumerators (indeed, composing anything Enumerable):

module Enumerable
  def +(other)
    Enumerator.new do |y|
      each { |e| y << e }
      other.each { |e| y << e }
    end
  end
end

a = (1..3) + (10..12) + (17..20)
a.each { |i| puts i }

I'm not sure where following this path would end up. & and | might still
have to build intermediate arrays. (However, if the source Enumerables
were known to be in sorted order, there would be interesting and
efficient ways to merge and join on them)

Any comments?

Regards,

Brian.
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-11-02 18:00
(Received via mailing list)
HI --

On Mon, 3 Nov 2008, Brian Candler wrote:

>    end
>  def ltake(n)
>  def lskip(n)
> if __FILE__ == $0
>
> Ruby hides the internals of this very nicely, I presume using Fibers to
> maintain the state inside each Enumerator.
>
> My first question is, does ruby 1.9 have methods like this already, and
> I've just overlooked them? If so, where should I be looking?

It did have something like this, but not any more. I'm not sure they
were identical to what you've written, but the basic idea of an
enumerator that bundled itself with a block did exist in 1.9 for a
while.

> If not, then is there value in adding something like this to the
> language? For example,

I think so, absolutely. I'm still struggling with trying to believe
that enumerators are more than marginally useful without the ability
to travel with knowledge of a block and still be lazy.

> 1. Put this stuff into an external library, Facets-like, with new
> methods in Enumerable as above (the method names given are just
> examples, there are probably better ones)
>
> 2. Redefine Enumerator#map, Enumerator#select etc, so that they return
> Enumerators instead of arrays. It seems reasonable to me that once you
> have an Enumerator at the base of the chain, you will likely want to
> keep chaining them together. You can always collapse the result to a
> real array using 'to_a'.

I'd strongly go for #1. #2 would require not only massive code
overhaul, but change of habits, and lots of to_a is unsightly (and you
wouldn't necessarily want to have to hard-code the class of what you
want back, anyway [i.e., the 'a' in 'to_a']).


David
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2008-11-02 19:05
(Received via mailing list)
On Nov 2, 2008, at 10:58 AM, David A. Black wrote:

>> My first question is, does ruby 1.9 have methods like this already,
>> and
>> I've just overlooked them? If so, where should I be looking?
>
> It did have something like this, but not any more. I'm not sure they
> were identical to what you've written, but the basic idea of an
> enumerator that bundled itself with a block did exist in 1.9 for a
> while.

I've seen you asking David, but did you ever get an answer of why this
feature was removed?  If there's not a great reason, I sure wish we
could have it back.

James Edward Gray II
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-11-02 19:21
(Received via mailing list)
Hi --

On Mon, 3 Nov 2008, James Gray wrote:

> I've seen you asking David, but did you ever get an answer of why this
> feature was removed?  If there's not a great reason, I sure wish we could
> have it back.

Shugo answered on ruby-core, mainly on the subject of efficiency, and
also on the intention behind enumerators which he says doesn't include
that kind of instantiation with block awareness. I guess I got used to
how they behaved when they were block-aware, so it's been hard for me
not to see that as how they're supposed to be.


David
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 19:39
David A. Black wrote:
> It did have something like this, but not any more. I'm not sure they
> were identical to what you've written, but the basic idea of an
> enumerator that bundled itself with a block did exist in 1.9 for a
> while.

I'd be interested to learn exactly what the dropped feature was.

Perhaps we're talking at cross purposes, but I'm not sure I'm suggesting
"an enumerator that bundled itself with a block". Rather, I'm suggesting
that certain methods in Enumerable which used to return an array,
instead return an Enumerator, so that they can be chained horizontally.

By this I mean:

   a.meth1 { ... }.meth2 { ... }.meth3 { ... }

currently runs "vertically" (meth1 enumerates all of 'a' and creates an
array; meth2 enumerates this array and creates another array; then meth3
enumerates this array and creates a final array)

By running "horizontally" I mean that the first element of 'a' is
processed all the way across; then the second element of 'a'; and so on.
No intermediate arrays are created.

> I'd strongly go for #1. #2 would require not only massive code
> overhaul, but change of habits, and lots of to_a is unsightly (and you
> wouldn't necessarily want to have to hard-code the class of what you
> want back, anyway [i.e., the 'a' in 'to_a']).

I don't understand the last sentence - a whole bunch of Enumerable
methods like #map, #select etc are already hard-coded to return an
Array, so you currently have no choice.

With what I propose you can use any method you like to build the result.
If you don't want an array, then don't use to_a. You could use 'inject'
or 'each' or 'each_with_object' to gather the resulting elements in
whatever way you like: e.g.

  a.meth1 { ... }.meth2 { ...}.each { |x,y| h[x] = y }

Or even, as the original example I posted shows, you can just use the
results as they arrive and then discard them:

  a.meth1 { ... }.meth2 { ...}.each { |x| puts x }

Now that Enumerators exist, the fact that Enumerable methods are
hard-coded always to create an Array seems very limiting. There are so
many other things you might want to create instead.
45196398e9685000d195ec626d477f0e?d=identicon&s=25 Thomas Sawyer (7rans)
on 2008-11-02 20:15
(Received via mailing list)
On Nov 2, 11:41 am, Brian Candler <b.cand...@pobox.com> wrote:
>     end
>   end

This doesn't work. But I think I understand what you are after.

Wouldn't it have to be something like:

   def lmap(&blk)
     this = self
     Functor.new |op, &blk|  # &blk in 1.9 only :(
       this.map(&blk).__send__(op, &blk)
     end
   end

T.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 22:01
Thomas Sawyer wrote:
> On Nov 2, 11:41�am, Brian Candler <b.cand...@pobox.com> wrote:
>> � � end
>> � end
>
> This doesn't work.

Sorry, which bit doesn't work?

$ ruby19 -v
ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]
$ ruby19 lazy.rb
113
115
117
119
121
123
125
127
129
131
1
2
3
10
11
12
17
18
19
20
$
45196398e9685000d195ec626d477f0e?d=identicon&s=25 Thomas Sawyer (7rans)
on 2008-11-02 22:15
(Received via mailing list)
On Nov 2, 4:00 pm, Brian Candler <b.cand...@pobox.com> wrote:
> ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]
Interesting. Clearly there is something substantially different in
1.9's version. I added

  require 'enumerator'

and using 1.8.6, I expected it would work the same. But get

:13:in `initialize': wrong number of arguments (0 for 1)
(ArgumentError)
  from t.rb:13:in `new'
  from t.rb:13:in `lselect'
  from t.rb:44

Enumerator.new() wants an enumerable object as an argument --what is
1.9 doing without it?

T.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 22:26
Thomas Sawyer wrote:
> Enumerator.new() wants an enumerable object as an argument --what is
> 1.9 doing without it?

1.9 supports two completely different types of Enumerator objects. The
first  is the simple 1.8.6 one which just maps method :each to arbitrary
method :foo on the upstream object. The second is the clever one.

-------------------------------------------------------- Enumerator::new
     Enumerator.new(obj, method = :each, *args)
     Enumerator.new { |y| ... }

     From Ruby 1.9.1
------------------------------------------------------------------------
     Creates a new Enumerator object, which is to be used as an
     Enumerable object iterating in a given way.

     In the first form, a generated Enumerator iterates over the given
     object using the given method with the given arguments passed. Use
     of this form is discouraged. Use Kernel#enum_for(), alias to_enum,
     instead.

       e = Enumerator.new(ObjectSpace, :each_object)
           #-> ObjectSpace.enum_for(:each_object)

       e.select { |obj| obj.is_a?(Class) }  #=> array of all classes

     In the second form, iteration is defined by the given block, in
     which a "yielder" object given as block parameter can be used to
     yield a value by calling the +yield+ method, alias +<<+.

       fib = Enumerator.new { |y|
         a = b = 1
         loop {
           y << a
           a, b = b, a + b
         }
       }

       p fib.take(10) #=> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-11-02 23:05
(Received via mailing list)
Hi --

On Mon, 3 Nov 2008, Brian Candler wrote:

> that certain methods in Enumerable which used to return an array,
> instead return an Enumerator, so that they can be chained horizontally.

This is the feature that disappeared -- definitely not identical to
what you're doing, but kind of germane:

   a = [1,2,3,4].enum_for(:map, &lambda {|x| x * 10})
   a.select {|x| x > 20 }   # [30,40]

> No intermediate arrays are created.
Right; that was clear from what you said. We're talking in the same
problem domain (chainable, lazy enumerators), in somewhat different
terms.

>> I'd strongly go for #1. #2 would require not only massive code
>> overhaul, but change of habits, and lots of to_a is unsightly (and you
>> wouldn't necessarily want to have to hard-code the class of what you
>> want back, anyway [i.e., the 'a' in 'to_a']).
>
> I don't understand the last sentence - a whole bunch of Enumerable
> methods like #map, #select etc are already hard-coded to return an
> Array, so you currently have no choice.

True, but that doesn't take newly-written enumerable classes, and
there are a few overrides here and there (like Hash#select returning a
hash, in 1.9).

I wonder what role what I have dubbed the "un-overriding" of
enumerable methods would play. Here's an example:

>> hash = { 1 => 2 }
=> {1=>2}
>> hash.select { true }
=> {1=>2}
>> hash.each.select { true }
=> [[1, 2]]

Since Enumerator doesn't override select (which of course it can't, in
a general way), calling select on an enumerator for the hash has a
different effect from calling select on the hash. I haven't puzzled
through how this would play out with your construct, but it seems like
it might crop up (though maybe no more than with enumerators in
general).

>  a.meth1 { ... }.meth2 { ...}.each { |x| puts x }
>
> Now that Enumerators exist, the fact that Enumerable methods are
> hard-coded always to create an Array seems very limiting. There are so
> many other things you might want to create instead.

I'm still not sure I'd like to have to do a kind of
enumerator-resolving last call to every map, select, inject, etc. It's
different with each, because it exists only for its block side-effects
in the first place. The ones that return result sets that you care
about would all have to be "capped" with to_a or something, unless the
last method in the chain somehow knew not to return an enumerator.


David
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-11-03 04:10
(Received via mailing list)
Hi --

On Mon, 3 Nov 2008, Brian Candler wrote:

> Thomas Sawyer wrote:
>> Enumerator.new() wants an enumerable object as an argument --what is
>> 1.9 doing without it?
>
> 1.9 supports two completely different types of Enumerator objects. The
> first  is the simple 1.8.6 one which just maps method :each to arbitrary
> method :foo on the upstream object. The second is the clever one.

Although... they're really the same if you think of it as an
enumerable object that has everything it needs except the knowledge of
what to iterate over (i.e., the "each" intelligence), and the block
and the method-attachment are just two different ways of teaching it
how it should iterate.

At least, that's how I'm planning to present it in my book, so I
thought I'd try it out here first :-)


David
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-03 09:57
David A. Black wrote:
> This is the feature that disappeared -- definitely not identical to
> what you're doing, but kind of germane:
>
>    a = [1,2,3,4].enum_for(:map, &lambda {|x| x * 10})
>    a.select {|x| x > 20 }   # [30,40]

OK, I think I see what you're saying now, except I'm not sure exactly
how ':map' interacts with this.

If this code calls the existing 'map' method, then does that mean that
'map' also behaved differently when invoked via an Enumerator? (Because
otherwise, 'map' normally builds the entire result array up-front). How
did 'map' know whether it was being called in this context or not?

Using current ruby-1.9 you would write something like

  a = [1,2,3,4]
  Enumerator.new do |y|
    a.each { |e| y << e * 10 }
  end.select { |x| x > 20 }

but you can see I've had to write the 'map' logic in-line.

> True, but that doesn't take newly-written enumerable classes, and
> there are a few overrides here and there (like Hash#select returning a
> hash, in 1.9).

Ah, well that's a bit messy. Taking that mess a bit further, I guess you
could arrange for Array#select similarly to be hardcoded always to
return an Array? Hmm.

> I'm still not sure I'd like to have to do a kind of
> enumerator-resolving last call to every map, select, inject, etc. It's
> different with each, because it exists only for its block side-effects
> in the first place. The ones that return result sets that you care
> about would all have to be "capped" with to_a or something, unless the
> last method in the chain somehow knew not to return an enumerator.

Maybe we're actually discussing my option #3 - which was to change
Enumerable#select to return an Enumerator.

My option #2 was that Enumerator#select return an Enumerator, in the
same way that Hash#select returns a Hash as you just described.

That is:

  class Enumerator
    def select
      ... etc
    end
  end

Then foo.select { ... } would always return an Array (via
Enumerable#select), unless foo was an Enumerator (in which case it would
return an Enumerator), or a Hash (in which case it would return a Hash)

The 'capping' would only be needed if you create an Enumerator in the
chain.

> I wonder what role what I have dubbed the "un-overriding" of
> enumerable methods would play. Here's an example:
>
>>> hash = { 1 => 2 }
> => {1=>2}
>>> hash.select { true }
> => {1=>2}
>>> hash.each.select { true }
> => [[1, 2]]
>
> Since Enumerator doesn't override select (which of course it can't, in
> a general way), calling select on an enumerator for the hash has a
> different effect from calling select on the hash.

Yes, I see what you're saying.

But actually I'm proposing that hash.each.select return another
Enumerator. So you can cap it with a 'to_hash' method at the end to get
the same result:

require 'lazy'
class Enumerator
  def to_hash
    each_with_object({}) { |(k,v),o| o[k] = v }
  end
end
hash = {1=>2}
p hash.lselect { true }            #=> #<Enumerator:0x8321f60>
p hash.lselect { true }.to_hash    #=> {1=>2}

As you say, arguably it's tedious to have to stick .to_a or .to_hash at
the end, but once you have an Enumerator, you have (a) lost the memory
of what it is you created it from, and (b) quite probably don't want to
create the same thing again anyway - for example, sticking each { ... }
at the end of the chain so as to consume the items rather than build a
new object.

So I think I'm warming to the middle ground: Enumerator#foo returns an
Enumerator, so Enumerators chain together, and if at the end of it you
really want an array (or a hash or whatever) rather than an Enumerator,
then you add .to_a or .to_hash

But equally, if you call Array#select or Hash#select directly, without
first "priming" the chain with an Enumerator object, then you get back
another Array or Hash immediately.

It makes sense when you understand it, although someone who didn't
understand this might have difficulty distinguishing

  [10,20,30].select { ... }.map { ... }.each { ... }
from
  [10,20,30].each.select { ... }.map { ... }.each { ... }

I think that 'each without a block' is not necessarily a good way of
flagging 'make me an enumerator'; writing to_enum would be clearer.

Actually, I have to say that I can't really see the purpose of 'map
without a block', 'select without a block'. I saw a posting showing how
you could chain

   foo.select.with_index { |x,i| ... }

but surely that would be clearer as

   foo.with_index.select { |x,i| ... }

(where 'with_index' creates an Enumerator which adds the index for you)

Unless there's a good use for this which I've overlooked, I'd be happy
for Ruby to go back to 1.8.6 behaviour of each/map/select *requiring* a
block. In that case, my earlier example would have to be written

  [10,20,30].to_enum.select { ... }.map { ... }.each { ... }

which is pretty clear (to me :-)

Regards,

Brian.
This topic is locked and can not be replied to.