I’ve been having a play with Enumerators in ruby 1.9, in particular
adding ‘lazy’ versions of those methods in Enumerable which normally
return arrays. Here’s a proof-of-concept implementation:
module Enumerable
def lmap(&blk)
Enumerator.new do |y|
each do |e|
y << blk[e]
end
end
end
def lselect(&blk)
Enumerator.new do |y|
each do |e|
y << e if blk[e]
end
end
end
def ltake(n)
Enumerator.new do |y|
count = 0
each do |e|
break if n <= count
y << e
count += 1
end
end
end
def lskip(n)
Enumerator.new do |y|
count = 0
each do |e|
y << e unless count <= n
count += 1
end
end
end
end
if FILE == $0
big = (1…1_000_000_000_000)
big.lselect { |i| i % 2 == 1 }.lmap { |i| i + 100 }.
lskip(5).ltake(10).each { |i| puts i }
end
So instead of generating an array containing all processed elements,
they return an Enumerator which processes the elements on demand. As a
result, you can easily chain them together, as shown in the example; you
can handle large sequences without creating large intermediate arrays;
and also handle infinite lists.
Ruby hides the internals of this very nicely, I presume using Fibers to
maintain the state inside each Enumerator.
My first question is, does ruby 1.9 have methods like this already, and
I’ve just overlooked them? If so, where should I be looking?
If not, then is there value in adding something like this to the
language? For example,
-
Put this stuff into an external library, Facets-like, with new
methods in Enumerable as above (the method names given are just
examples, there are probably better ones) -
Redefine Enumerator#map, Enumerator#select etc, so that they return
Enumerators instead of arrays. It seems reasonable to me that once you
have an Enumerator at the base of the chain, you will likely want to
keep chaining them together. You can always collapse the result to a
real array using ‘to_a’.
You can always forcibly create an Enumerator using ‘each’ without a
block, e.g.
(1…10).map { … }.select { … } # normal evaluation
(1…10).each.map { … }.select { … } # this would be ‘lazy’
-
A more radical language change would be to have Enumerable#map,
#select etc return Enumerators always. Some existing code would need a
‘to_a’ stuck on the end. Also, it may not be as efficient:a1 = (1…10).map { |i| i2 } # map generates an array
immediately
a2 = (1…10).lmap { |i| i2 }.to_a # Fibers are involved -
If you accept that we should have Enumerators rather than Arrays as
intermediate elements in the chain, then I guess we have to consider
composing Enumerators (indeed, composing anything Enumerable):
module Enumerable
def +(other)
Enumerator.new do |y|
each { |e| y << e }
other.each { |e| y << e }
end
end
end
a = (1…3) + (10…12) + (17…20)
a.each { |i| puts i }
I’m not sure where following this path would end up. & and | might still
have to build intermediate arrays. (However, if the source Enumerables
were known to be in sorted order, there would be interesting and
efficient ways to merge and join on them)
Any comments?
Regards,
Brian.