Forum: Ruby-core [Bug #1711] Marshal Failing to Round-Trip Certain Recurisve Data Structures

Posted by Run Paint Run Run (Guest)
on 2009-07-01 19:30
(Received via mailing list)
Bug #1711: Marshal Failing to Round-Trip Certain Recurisve Data 
Structures
http://redmine.ruby-lang.org/issues/show/1711

Author: Run Paint Run Run
Status: Open, Priority: Normal
Category: core
ruby -v: ruby 1.9.2dev (2009-07-01 trunk 23924) [i686-linux]

I have attached a script that fails to round-trip a recursive data 
structure on 1.9, but succeeds on 1.8. IOW, it prints true on 1.8; false 
on 1.9. I haven't been able to reduce it further because I'm unfamiliar 
with Marshal.

    $ ruby86 -vw /tmp/marshal.rb
    ruby 1.8.6 (2009-06-08 patchlevel 369) [i686-linux]
    true

    $ ruby87 -vw /tmp/marshal.rb
    ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-linux]
    true

    $ ruby8 -vw /tmp/marshal.rb
    ruby 1.8.8dev (2009-06-28) [i686-linux]
    true

    $ ruby -vw /tmp/marshal.rb
    ruby 1.9.2dev (2009-07-01 trunk 23924) [i686-linux]
    false
Posted by Yuki Sonoda (Guest)
on 2009-07-16 08:45
(Received via mailing list)
Issue #1711 has been updated by Yuki Sonoda.


 class UserDefined

   class Nested
     def ==(other)
       other.kind_of? self.class
     end
   end

   attr_reader :a, :b

   def initialize
     @a = 'stuff'
     @b = @a
   end

   def _dump(depth)
     Marshal.dump [@a, @b]
   end

   def self._load(data)
     a, b = Marshal.load data

     obj = allocate
     obj.instance_variable_set :@a, a
     obj.instance_variable_set :@b, b

     obj
   end

   def ==(other)
     self.class === other and
     @a == other.a and
     @b == other.b
   end

 end

 class UserDefinedWithIvar
   attr_reader :a, :b, :c

   def initialize
     @a = 'stuff'
     @a.instance_variable_set :@foo, :UserDefinedWithIvar
     @b = 'more'
     @c = @b
   end

   def _dump(depth)
     Marshal.dump [@a, @b, @c]
   end

   def self._load(data)
     a, b, c = Marshal.load data

     obj = allocate
     obj.instance_variable_set :@a, a
     obj.instance_variable_set :@b, b
     obj.instance_variable_set :@c, c

     obj
   end

   def ==(other)
     self.class === other and
     @a == other.a and
     @b == other.b and
     @c == other.c and
     @a.instance_variable_get(:@foo) == 
other.a.instance_variable_get(:@foo)
   end
 end

 arr = []
 myproc = Proc.new { |o| arr << o }
 o1 = UserDefined.new;
 o2 = UserDefinedWithIvar.new
 obj = [o1, o2, o1, o2]
 Marshal.load Marshal.dump(obj), myproc
 p arr == [o1, o2, obj]

----------------------------------------
http://redmine.ruby-lang.org/issues/show/1711
Posted by Yukihiro Matsumoto (Guest)
on 2009-07-27 15:15
(Received via mailing list)
Hi,

In message "Re: [ruby-core:24105] [Bug #1711] Marshal Failing to 
Round-Trip Certain Recurisve Data Structures"
    on Thu, 2 Jul 2009 02:30:15 +0900, Run Paint Run Run 
<redmine@ruby-lang.org> writes:

|I have attached a script that fails to round-trip a recursive data structure on 1.9, but succeeds on 1.8. IOW, it prints true on 1.8; false on 1.9. I haven't been able to reduce it further because I'm unfamiliar with Marshal.

1.9 Marshal.load with proc is incompatible with 1.8, as following:

  * 1.8 load ignores the value returned from proc, whereas 1.9 load
    replaces the object by the value from proc.

  * 1.9 load calls proc for recursively visited data.  1.8 dump only
    calls proc once.

That is to allow rebuilding loading data using proc.  For
the example in [ruby-core:24361], replacing

 myproc = Proc.new { |o| arr << o }

by

 myproc = Proc.new { |o| arr << o; o }

should work as expected.  The future action would be either:

 (a) the new 1.9 behavior has meaning.  since virtually no one uses
     Marshal.load with proc, it's OK to change it for a good reason.

 (b) incompatibility is bad.  should be reverted to the original 1.8
     behavior.

 (c) or probably add another optional (or keyword) argument to enable
     the new behavior, keeping the old behavior by default.

Any opinion?

              matz.
Posted by Eric Hodel (Guest)
on 2009-07-27 22:38
(Received via mailing list)
On Jul 27, 2009, at 06:14, Yukihiro Matsumoto wrote:
>
>
>
> (b) incompatibility is bad.  should be reverted to the original 1.8
>     behavior.
>
> (c) or probably add another optional (or keyword) argument to enable
>     the new behavior, keeping the old behavior by default.
>
> Any opinion?

I haven't seen Marshal.load with proc in the wild, so I prefer (a).

In the future, I could imagine using the new 1.9 behavior in RubyGems
for backwards compatibility.
Posted by Run Paint Run Run (Guest)
on 2009-08-02 09:42
(Received via mailing list)
Issue #1711 has been updated by Run Paint Run Run.


> 1.9 Marshal.load with proc is incompatible with 1.8, as following:
>
>  * 1.8 load ignores the value returned from proc, whereas 1.9 load
>    replaces the object by the value from proc.
>
>
>  * 1.9 load calls proc for recursively visited data.  1.8 dump only
>    calls proc once.

Thank you for the clarification, matz. I've updated the specs for the 
first point as I understand it and it seems reasonable. I kind of 
understand the second point, but am not sure of the implications.

The following example looks to illustrate the difference. 1.8 calls the 
proc for the first element of the Array, doesn't call it for the second 
because that's recursive, then calls it again for the entire Array.

  $ ruby86 -e 'a=[1]; a << a; Marshal.load(Marshal.dump(a), proc {|e| p 
e})'
  1
  [1, [...]]

1.9, however, also calls the proc once for the recursive element, but I 
don't claim to understand exactly what form the argument takes. In this 
case it's the data structure containing the recursive element before the 
recursive element was appended. Is this what's intended? Would this help 
or hinder rebuilding Marshal'd data?

  $ ruby -e 'a=[1]; a << a; Marshal.load(Marshal.dump(a), proc {|e| p 
e})'
  1
  [1]
  [1, [...]]
----------------------------------------
http://redmine.ruby-lang.org/issues/show/1711
Posted by _ wanabe (Guest)
on 2010-04-04 01:45
(Received via mailing list)
Issue #1711 has been updated by _ wanabe.


This issue should be moved to Feature, shouldn't it?
----------------------------------------
http://redmine.ruby-lang.org/issues/show/1711
Posted by Nobuyoshi Nakada (nobu)
on 2010-04-22 04:31
(Received via mailing list)
Hi,

At Sun, 4 Apr 2010 08:44:45 +0900,
_ wanabe wrote in [ruby-core:29242]:
> This issue should be moved to Feature, shouldn't it?

It's not what is called round-trip, therefore wrong thing is
the precondition.

I think this ticket would be rejected.
Posted by Yusuke Endoh (Guest)
on 2010-04-23 13:19
(Received via mailing list)
Issue #1711 has been updated by Yusuke Endoh.

Status changed from Open to Rejected

Hi,

In 1.9, proc is called for each node of object tree,
even if the node is recursive.

> $ ruby -e 'a=[1]; a << a; Marshal.load(Marshal.dump(a), proc {|e| p e})'
> 1          # First element
> [1]        # Second element (recursive node; array itself)
> [1, [...]] # Generated array

In 1.8, the recursive node is passed to proc just once.

> $ ruby -e 'a=[1]; a << a; Marshal.load(Marshal.dump(a), proc {|e| p e})'
> 1          # First element
>            # Second element is not called because it is recursive
> [1, [...]] # Generated array

According to matz, this is intended spec change.

There is no problem anywhere.  So I close this ticket.

--
Yusuke Endoh <mame@tsg.ne.jp>
----------------------------------------
http://redmine.ruby-lang.org/issues/show/1711
Posted by Joel VanderWerf (Guest)
on 2010-04-23 22:09
(Received via mailing list)
Yusuke Endoh wrote:
>> 1          # First element
> According to matz, this is intended spec change.
> 
> There is no problem anywhere.  So I close this ticket.

So, just to make sure I understand:

With this change in 1.9, the proc is called while the recursive object
is partially constructed, and then again of the same object when it is
fully constructed? That seems to be what I see here:

$ ruby19  -e 'a=[1]; a << a; Marshal.load(Marshal.dump(a), proc {|e| p
e; p e.object_id})'
1
3
[3]
14163576
[3, 14163576]
14163576


It seems like it might be nice if the proc were able to distinguish
these two calls (perhaps a second arg to the proc?). For example, if you
were keeping a count of the objects, you wouldn't want to double count.
Or if you were updating a GUI, you would only want to do that once.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.