Forum: Ruby Iterator objects and lazy evaluation

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
08636cb750fa6161f128d9b5b0835427?d=identicon&s=25 Yuh-Ruey Chen (Guest)
on 2008-11-01 10:00
(Received via mailing list)
Two questions:

1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?

An example:

def read_files(files)
  files.each {|file|
    # blah
  }
end
read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
# map_iter would return an iterator object immediately instead of
opening every file and storing them into an array

I know that I could do this instead:

['file1', 'file2', 'file3'].each {|x|
  open(x) {
    # blah
  }
}

but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):

x = lazy {a + b}
a = 1
b = 2
p x  # 3 (calculates a + b)
p x  # 3 (memoized value)
a = 3
p x  # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x  # [1,2,3,4] (recalculates a + b)
p x  # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)

I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

Thanks.
A246f7c0ce5f2909483d358bd9e83e4e?d=identicon&s=25 Mike Gold (mikegold)
on 2008-11-01 11:45
Yuh-Ruey Chen wrote:
>
> I know that I could do this instead:
>
> ['file1', 'file2', 'file3'].each {|x|
>   open(x) {
>     # blah
>   }
> }
>
> but sometimes it's inconvenient to do that if there's already a
> function that accepts an Enumerable such as read_files above.

some_file_operation = lambda { |file|
  puts "doing stuff with #{file}"
}

%w(file1 file2 file3).each(&some_file_operation)
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

def some_file_operation2(file)
  puts "doing stuff with #{file}"
end

%w(file1 file2 file3).each(&method(:some_file_operation2))
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

In Ruby, a.x calls the method named 'x', whereas Python requires
parens a.x().  As a consequence, the syntax for getting the method in
Ruby is a.method(:x).


>
> 2) Is there some lazy evaluation library that can recalculate lazy
> expression when values change? Something with the following semantics
> (or something like it):
>
> x = lazy {a + b}
> a = 1
> b = 2
> p x  # 3 (calculates a + b)
> p x  # 3 (memoized value)
> a = 3
> p x  # 5 (recalculates a + b)
> a = [1,2]
> b = [3,4]
> p x  # [1,2,3,4] (recalculates a + b)
> p x  # [1,2,3,4] (memoized value)
> a[1] = '.'
> p x # [1,'.',3,4] (recalculates a + b)
>
> I know there's a library called lazy.rb, but it's not exactly what I'm
> looking for as it doesn't implement this functionality.

If you google for ruby memoization, you'll find several options
available.  However none can use the syntax in your example.  Ruby
determines at compile time that "p x" is a local variable lookup.
You'll have to use "p lazy.x" or some such.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-11-01 12:42
(Received via mailing list)
On 01.11.2008 09:58, Yuh-Ruey Chen wrote:
>   }
> end
> read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
> # map_iter would return an iterator object immediately instead of
> opening every file and storing them into an array

There is one issue with this design: the IO object is not closed
properly.

> I know that I could do this instead:
>
> ['file1', 'file2', 'file3'].each {|x|
>   open(x) {
>     # blah
>   }
> }
>
> but sometimes it's inconvenient to do that if there's already a
> function that accepts an Enumerable such as read_files above.

So read_files expects an enumeration of opened IO objects, I guess.  And
you want to make sure that files are only opened on demand, correct?
You could do something like this

module Enumerable
   FileIter = Struct.new :enum do
     include Enumerable
     def each(&b)
       enum.each do |file_name|
         File.open(file_name, &b)
       end
       self
     end
   end

   def file_iter
     FileIter.new self
   end
end

Robert@babelfish2 ~
$ echo 1 > a

Robert@babelfish2 ~
$ echo 2 > b

irb(main):016:0> def read_files(files)
irb(main):017:1> files.each {|io| p io.read}
irb(main):018:1> end
=> nil
irb(main):019:0> read_files %w{a b}.file_iter
"1\n"
"2\n"
=> #<struct Enumerable::FileIter enum=["a", "b"]>
irb(main):020:0>

Note that there is ARGF which basically does a similar thing: it opens
and closes all files named in ARGV and presents them as a single IO
object.

Your approach feels a bit off the usual Ruby way and I am suspecting
that you are trying to force foreign patterns into the language.

I'd also say there is a design issue with your version of read_files:
basically it iterates over a collection of file handles and does
something to each one of them.  Given the elegance and shortness of
Ruby's iteration idiom the function read_files would rather be called
read_file and process a single IO only.

> 2) Is there some lazy evaluation library that can recalculate lazy
> expression when values change? Something with the following semantics
> (or something like it):
>
> x = lazy {a + b}
> a = 1
> b = 2
> p x  # 3 (calculates a + b)

This cannot work IMHO since local variables a and b are defined *after*
the block.

> I know there's a library called lazy.rb, but it's not exactly what I'm
> looking for as it doesn't implement this functionality.

IMHI you better explicitly provide arguments to whatever you use to
lazily calculate values.  One way is memoize and another option is to
use a Hash for this if there is just one argument:

irb(main):023:0> square = Hash.new {|h,k| puts "called"; h[k] = k * k}
=> {}
irb(main):024:0> square[1000000]
called
=> 1000000000000
irb(main):025:0> square[1000000]
=> 1000000000000
irb(main):026:0> square[1000000]
=> 1000000000000
irb(main):027:0>

If you have more arguments, you can do

def lazy(&b)
   cache = {}
   lambda {|*a| cache[a] ||= b[*a]}
end

irb(main):006:0> plus2 = lazy {|a,b| puts "called #{a} + #{b}"; a+b}
=> #<Proc:0x7ff8c2d8@(irb):3>
irb(main):007:0> plus2[1,2]
called 1 + 2
=> 3
irb(main):008:0> plus2[1,2]
=> 3
irb(main):009:0> plus2[2,1]
called 2 + 1
=> 3
irb(main):010:0> plus2[2,1]
=> 3
irb(main):011:0>

Kind regards

  robert
4dea430d31b993abaf41cd9b54f8128d?d=identicon&s=25 Avdi Grimm (avdi)
on 2008-11-01 15:46
(Received via mailing list)
On Sat, Nov 1, 2008 at 4:58 AM, Yuh-Ruey Chen <maian330@gmail.com>
wrote:
> 1) Are there equivalents for iteration/enumeration functions like map
> that return iterator/enumeration objects (in a Python sense)?

Have a look at the Enumerator standard library:
http://ruby-doc.org/stdlib/libdoc/enumerator/rdoc/index.html

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-01 19:17
Yuh-Ruey Chen wrote:
> 1) Are there equivalents for iteration/enumeration functions like map
> that return iterator/enumeration objects (in a Python sense)?

You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
Enumberable method which hasn't been given a block returns an
Enumerator.

irb(main):001:0> a = (1..10).each
=> #<Enumerator:0x83b687c>
irb(main):002:0> a.next
=> 1
irb(main):003:0> a.next
=> 2
irb(main):004:0> a.next
=> 3

However I don't think you can use these for chaining. It might be nice
if you could do something like

   (1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
x }

which wouldn't create any intermediate array. I guess you'd need each
element to call 'next' on the previous one, and you'd need something at
the end to trigger the whole process off.
F53b05cdbdf561cfe141f69b421244f3?d=identicon&s=25 David A. Black (Guest)
on 2008-11-01 20:31
(Received via mailing list)
Hi --

On Sun, 2 Nov 2008, Brian Candler wrote:

> irb(main):002:0> a.next
> x }
>
> which wouldn't create any intermediate array. I guess you'd need each
> element to call 'next' on the previous one, and you'd need something at
> the end to trigger the whole process off.

It used to be that you could associate a block with an enumerator
lazily:

>> e = [1,2,3].enum_for(:map,&lambda {|x| x * 10 })
=> #<Enumerable::Enumerator:0x55b11c>
>> e.next
=> 10

and some chaining was possible. But that's gone now. Enumerators are
still able to remember regular arguments, though, so you can do:

> e = [1,2,3,4,5].each_cons(2)
=> #<Enumerator:0x42548c>
>> e.select {|a,b| b > 3 }
=> [[3, 4], [4, 5]]

(The contrast here is between a relatively old 1.9 version and 1.9.1
preview 1.)


David
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-01 22:18
David A. Black wrote:
> It used to be that you could associate a block with an enumerator
> lazily:
>
>>> e = [1,2,3].enum_for(:map,&lambda {|x| x * 10 })
> => #<Enumerable::Enumerator:0x55b11c>
>>> e.next
> => 10
>
> and some chaining was possible. But that's gone now.

I see. I was wondering about turning the whole thing backwards: pull
instead of push. I did some doodling, and after refactoring here's what
it boiled down to (ruby1.9.1-preview1):

class Enumerator
  def nmap(&blk)
    Enumerator.new do |y|
      begin
        y << blk[self.next] while true
      rescue StopIteration
      end
    end
  end

  def nselect(&blk)
    Enumerator.new do |y|
      begin
        while true
          val = self.next
          y << val if blk[val]
        end
      rescue StopIteration
      end
    end
  end
end

# Two variations of the same theme
(1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.each { |i| puts
i }
puts (1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.to_a

Now, the intention was as follows: the final method in the chain (each
or to_a) repeatedly calls the 'next' method on the item before it, which
calls 'next' on the item before it, and so on.

It seems to work. I guess Enumerators must use Fibers internally to
pause the enumerator block as required?
08636cb750fa6161f128d9b5b0835427?d=identicon&s=25 Yuh-Ruey Chen (Guest)
on 2008-11-02 00:06
(Received via mailing list)
On Nov 1, 1:16 pm, Brian Candler <b.cand...@pobox.com> wrote:
> irb(main):002:0> a.next
> x }
>
> which wouldn't create any intermediate array. I guess you'd need each
> element to call 'next' on the previous one, and you'd need something at
> the end to trigger the whole process off.
> --
> Posted viahttp://www.ruby-forum.com/.

Yes, this is exactly what I'm looking for - the ability to chain
iterators like this without creating intermediate arrays, hopefully
using less memory.
08636cb750fa6161f128d9b5b0835427?d=identicon&s=25 Yuh-Ruey Chen (Guest)
on 2008-11-02 00:25
(Received via mailing list)
On Nov 1, 6:36 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> >    files.each {|file|
>
> you want to make sure that files are only opened on demand, correct?
That was really just a random example that I thought up quickly to
explain my question. Others have already posted better examples.

> Your approach feels a bit off the usual Ruby way and I am suspecting
> that you are trying to force foreign patterns into the language.

Per my previous reply, I'm trying to find a way to chain iterators
without nesting blocks (so they can be passed freely into other
functions expecting enumerables) and without intermediate arrays.

> the block.
>
> called
>    cache = {}
> irb(main):009:0> plus2[2,1]
> called 2 + 1
> => 3
> irb(main):010:0> plus2[2,1]
> => 3
> irb(main):011:0>
>
> Kind regards
>
>         robert

Hmm, what I'm looking for is not just a simple memoization technique.
Suppose I have a function that computes the union of two arrays
whenever each array is changed. Using a perhaps more possible syntax:

attr_accessor :array_a

def foo
   lazy_union = lazy_expr { lazy_depends(:array_a) +
lazy_depends(:array_b) }
   @array_a = [1,2]
   @array_b = [3,4]
   p lazy_union # [1,2,3,4]
   @array_a[1] = 5
   p lazy_union # [1,5,3,4]
end

This is a case your memoization technique doesn't address. Yet I'm
pretty sure this is a common use case, so I was thinking that there
should be some library out there that provides this functionality.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-11-02 11:40
(Received via mailing list)
On 02.11.2008 00:22, Yuh-Ruey Chen wrote:
> On Nov 1, 6:36 am, Robert Klemme <shortcut...@googlemail.com> wrote:

> Per my previous reply, I'm trying to find a way to chain iterators
> without nesting blocks (so they can be passed freely into other
> functions expecting enumerables) and without intermediate arrays.

I see you have been referred to Enumertor and to_enum already.

>    @array_b = [3,4]
>    p lazy_union # [1,2,3,4]
>    @array_a[1] = 5
>    p lazy_union # [1,5,3,4]
> end
>
> This is a case your memoization technique doesn't address. Yet I'm
> pretty sure this is a common use case, so I was thinking that there
> should be some library out there that provides this functionality.

Well, for this I'd rather create a custom class.  In your case of Array
you could do something like this

# ALT: require 'md5'

class MemoFunc
   def initialize(*args, &b)
     @args = args
     @hsh  = nil
     @fun  = b
     @result = self
   end

   def call
     h = arg_hash
     if @result == self || h != @hsh
       @result = @fun[*@args]
       @hsh    = h
     end
     @result
   end

   def invalidate
     @result = self
   end

   def rearg(*a)
     @args = a
     self
   end

private
   def arg_hash
     @args.hash
     # ALT: MD5.digest(Marshal.dump(@args))
   end
end

irb(main):024:0> a = [1,2]
=> [1, 2]
irb(main):025:0> b = [3,4]
=> [3, 4]
irb(main):026:0> f = MemoFunc.new(a,b) {|x,y| puts "eval"; x | y}
=> #<MemoFunc:0x7ffa63cc @hsh=nil, @args=[[1, 2], [3, 4]],
@result=#<MemoFunc:0x7ffa63cc ...>, @fun=#<Proc:0x7ffa646c@(irb):26>>
irb(main):027:0> f.call
eval
=> [1, 2, 3, 4]
irb(main):028:0> f.call
=> [1, 2, 3, 4]
irb(main):029:0> a[1] = 5
=> 5
irb(main):030:0> f.call
eval
=> [1, 5, 3, 4]
irb(main):031:0> f.call
=> [1, 5, 3, 4]
irb(main):032:0> f.call
=> [1, 5, 3, 4]
irb(main):033:0>


The difficult part is to get detection of changes of the arguments right
(meaning safe and fast).  There is no general solution that fits well to
all situations.  My implementation with Array#hash is too weak for a
general solution.  I am thinking that creating an MD5 of the marshalled
stream of the argument Array is more robust, but you may pay a
performance penalty.  Which might be ok depending on the complexity of
the operation.

Kind regards

  robert
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 13:36
Yuh-Ruey Chen wrote:
> Suppose I have a function that computes the union of two arrays
> whenever each array is changed. Using a perhaps more possible syntax:
>
> attr_accessor :array_a
>
> def foo
>    lazy_union = lazy_expr { lazy_depends(:array_a) +
> lazy_depends(:array_b) }
>    @array_a = [1,2]
>    @array_b = [3,4]
>    p lazy_union # [1,2,3,4]
>    @array_a[1] = 5
>    p lazy_union # [1,5,3,4]
> end

There is an Observable pattern (observer.rb); however it doesn't
automatically notice when an instance variable has been changed, and I
don't know of a hook which detects that.

In any case, it's possible for the array to remain unchanged (i.e. it
contains references to the same objects) whilst those objects still
mutate:

  @array_a = [1, "foo"]
  ...
  @array[1] << "bar"

Actually, your "lazy_union" could be fine with this, since the union
object would also contain the mutated object, but a more general case
like lazy_concat which depends on the *values* of the objects would have
serious problems.

So, do you want your lazy dependency system to notice this? What about
objects held an arbitary number of levels deep?

I'm not sure that Ruby really suits itself well to this sort of pattern,
because of the mutable nature of objects. Perhaps you want to look at
Erlang instead.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-02 22:21
Yuh-Ruey Chen wrote:
> Yes, this is exactly what I'm looking for - the ability to chain
> iterators like this without creating intermediate arrays, hopefully
> using less memory.

I've researched this a bit more, and it seems that ruby 1.9 *does*
support this model in a very smart way. Unfortunately the built-in
Enumerable methods like #map, #select, #reject don't, but these are easy
enough to synthesise. I started a new thread on that topic at
http://www.ruby-forum.com/topic/169807

Anyway, if you want the following pattern

   generator --> filter --> filter ... --> consumer

you implement it as follows.

(1) The generator is any object which implements 'each' and yields the
members of the (possibly infinite) collection.

class Fib
  include Enumerable
  def initialize(a=1,b=1)
    @a, @b = a, b
  end
  def each
    a, b = @a, @b
    yield a
    while true
      yield b
      a, b = b, a+b
    end
  end
end

(by including Enumerable, you gain a whole load of useful methods which
in turn call your 'each' method)

(2) The consumer is an object which calls 'each' with a block that does
something with the data yielded to it:

   obj.each { |i| puts i }

So far, all so ruby-1.8.

(3) The clever bit is the filter, and this only works in ruby-1.9. You
create an Enumerator object with a block which processes the data.

  Enumerator.new do |y|
    obj.each do |thing|     # the filter INPUT
      ...
      y << otherthing       # the filter OUTPUT
    end
  end

The really clever bit is the "y << otherthing" line. By cunning use of
Fibers (I believe), execution of the filter stops at this point,
allowing the next filter along the chain to make use of the value as its
input. Once all the filters along the chain have finished using this
value, this filter carries on into the next iteration of its 'each'
loop, to take the next input from upstream.

So here's a complete example with an infinite source, two filters, and a
consumer. Notice that only the consumer decides when to terminate;
before this point, everything is a potentially infinite series.

module Enumerable
  def lmap(&blk)
    Enumerator.new do |y|
      each do |e|
        y << blk[e]
      end
    end
  end

  def lselect(&blk)
    Enumerator.new do |y|
      each do |e|
        y << e if blk[e]
      end
    end
  end
end

class Fib
  include Enumerable
  def initialize(a=1,b=1)
    @a, @b = a, b
  end
  def each
    a, b = @a, @b
    yield a
    while true
      yield b
      a, b = b, a+b
    end
  end
end

Fib.new.lselect { |i| i % 2 == 0 }.lmap { |i| "<#{i}>" }.
  each_with_index { |i,cnt| puts i; break if cnt >= 20 }

$ ruby19 fib.rb
<2>
<8>
<34>
<144>
<610>
<2584>
<10946>
<46368>
<196418>
<832040>
<3524578>
<14930352>
<63245986>
<267914296>
<1134903170>
<4807526976>
<20365011074>
<86267571272>
<365435296162>
<1548008755920>
<6557470319842>
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-03 16:03
And here's a wrapper for this pattern:

class Enumerator
  def filter(&blk)
    self.class.new do |y|
      each do |*input|
        blk.call(y, *input)
      end
    end
  end
end

a = (1..1_000_000_000).to_enum
a.filter { |out,inp| out << inp if inp % 2 == 0 }.
  filter { |out,inp| out << inp+100 }.
  with_index.each { |inp,c| puts inp; break if c > 10 }

$ ruby19 -v
ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]
$ ruby19 filter.rb
102
104
106
108
110
112
114
116
118
120
122
124
$
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-04 14:25
Actually, it turns out you don't even need ruby-1.9 for this. The
following runs fine in ruby-1.8.6:

  module Enumerable
    class Gen
      include Enumerable
      def initialize(&body)
        @body = body
      end
      def each(&receiver)
        @body.call(receiver)
      end
    end

    def filter(&blk)
      Gen.new do |y|
        each do |*input|
          blk.call(y, *input)
        end
      end
    end
  end

  (1..10).
    filter { |y,e| puts "Sending #{e.inspect}"; y[e]; puts "Sent" }.
    filter { |y,e| y[e] if e % 2 == 0; sleep 0.5 }.
    filter { |y,e| y[e + 100]; sleep 0.5 }.
    each { |e| puts e }

The puts and sleep calls are in there to show this is proper
'horizontal' evaluation, with no intermediate arrays. But if you want an
array of the finished product, just replace the last line with "to_a"

I wouldn't be surprised if this pattern exists in a library somewhere,
but I don't know where.
This topic is locked and can not be replied to.