Customized serialization unreliable?

Hi,

I have a strange problem with the customized serialization…
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

class YYY
attr_reader :data, :version

def initialize
@data = “different objects”
@version = 1
end

def marshal_dump()
return [@version,@data]
end

def marshal_load(var)
@version = var[0]
case @version
when 1
@data = var[1]
else
#do something else
end
end
end

The problem I find out is the serialization works randomly, sometime the
object is serialized correctly and sometime not.(since I’m not able to
deserialize the object, some of the attributes of the objects become
nil) also I find out the size of serialized file is different each time.
and I try to put the serialized data in memory, the size of the memory
is also different each time. like this:

              dumpStr = Marshal.dump(mainResults)
              puts dumpStr.length.to_s()
              mainResults2 = Marshal.load(dumpStr)

Does someone know anything about the customized serialization? There is
not a lot of doc about this…

Thanks you very much

Sayoyo

sayoyo Sayoyo wrote:

Hi,

I have a strange problem with the customized serialization…
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

Another thing to try: call marshal_dump on your top-level object. You
should get a nested structure of arrays and so on. Does this look ok?

Now, feed that data structure back into your top-level marshal_load
method. Does this cause the problem too?

Hi,

First, Thanks you very much for helping me, Yes, I have tried it, and
the serialization still works randomly. no idea why…

I suspect the there is a problem in memory allocation somewhere when
data is written to the “file”, since it is an random effect, I can
hardly put a hand on it.

do you know who is the responsible of this part of ruby?

Sayoyo

On 22.05.2008 18:48, sayoyo Sayoyo wrote:

First, Thanks you very much for helping me, Yes, I have tried it, and
the serialization still works randomly. no idea why…

I suspect the there is a problem in memory allocation somewhere when
data is written to the “file”, since it is an random effect, I can
hardly put a hand on it.

My gut feeling rather points to an effect caused by different ordering
of objects in a Hash. Or you have an issue caused by a loop in your
object graph. As far as I can see custom serialization works ok - at
least for non complex structures:

irb(main):016:0> F = Struct.new :a, :b do
irb(main):017:1* def marshal_dump
irb(main):018:2> [a,b]
irb(main):019:2> end
irb(main):020:1>
irb(main):021:1* def marshal_load(x)
irb(main):022:2> self.a = x[0]
irb(main):023:2> self.b = x[1]
irb(main):024:2> end
irb(main):025:1> end
=> F
irb(main):026:0> x = F.new 1,2
=> #
irb(main):027:0> s = Marshal.load(Marshal.dump(x))
=> #
irb(main):028:0>

I am not sure why you need custom serialization. But here is an
alternative approach: create a method that returns a data structure
which you then serialize and add a class method that constructs your in
memory structure from that state. E.g.

class Foo
attr_accessor :name, :size

def to_serial
[name, size]
end

def self.from_serial(obj)
f = new
f.name = obj[0]
f.size = obj[1]
f
end
end

Of course, this only works if you know what you are deserializing.

Another alternative is to separate “configuration state” (which is
serializable, e.g. file name) from “operation state” (which is not
serializable, e.g. file descriptor) and serialize only the configuration
state. This is probably the cleanest approach.

do you know who is the responsible of this part of ruby?

You probably can find out by looking at the sources.

Kind regards

robert

def marshal_dump()
return [@version,@data]
end

If there are actually classes built on instance methods marshal_dump and
marshal_load, then Marshal would be completely ignoring those methods.
Assuming you meant “_dump” which is the instance method to overload for
customized serialization, and either “self._load” or klassname._load
which is the class method for the same, then this would be happening:

Array.to_s will be called in Marshal.dump because it expects a string.
For example, with the YYY class:
@version = 1
@data = [“e”, “i”]
_dump returns [1, [“e”, “i”]]
The Marshaling process will form “1ei”

Your Loading process with “1ei” will give
@version = 1
@data = “e” # instead of [“e”, “i”]

If for whatever reason you are Marshaling the results of your
marshal_dump methods – Marshal.dump(my_yyy.marshal_dump) --, then you
have neglected to properly handle object graph cycles, and shared
references.

Further, from the ruby-doc Marshal class:
“Some objects cannot be dumped: if the objects to be dumped include
bindings, procedure or method objects, instances of class IO, or
singleton objects, a TypeError will be raised.”

On 23.05.2008 02:29, Stefan R. wrote:

def initialize
@data = “different objects”
@version = 1
end

def marshal_dump()
return [@version,@data]
end

Your problems most likely are caused by you returning an array instead
of a String. I’m surprised Marshal doesn’t complain.

I don’t think a String must be returned? Apparently the standard lib
also does not think so:

irb(main):009:0> require ‘ostruct’
=> true
irb(main):010:0> o=OpenStruct.new
=> #
irb(main):011:0> o.foo=123
=> 123
irb(main):012:0> o.marshal_dump
=> {:foo=>123}
irb(main):013:0> o.marshal_dump.class
=> Hash
irb(main):014:0>

This works here quite well without any random outtakes. At least I
haven’t yet hit one.

I’ll show you one - but it’s not random. :slight_smile: Your approach with the
String works only well for simple cases. But the downside is that it
does not handle loops in object graphs properly:

robert@fussel /cygdrive/c/Temp
$ cat marsh.rb
F = Struct.new :x, :y
a = F.new
b = F.new
c = F.new a,b
a.x = c
b.x = c
t1 = Marshal.load(Marshal.dump©)
p t1.equal?(t1.x.x)
class F
def marshal_dump
[x,y]
end

def marshal_load(dat)
self.x, self.y = dat
end
end
t2 = Marshal.load(Marshal.dump©)

robert@fussel /cygdrive/c/Temp
$ cat marsh.rb
F = Struct.new :x, :y
a = F.new
b = F.new
c = F.new a,b
a.x = c
b.x = c

t1 = Marshal.load(Marshal.dump©)
p t1.equal?(t1.x.x)

class F
def marshal_dump
[x,y]
end

def marshal_load(dat)
self.x, self.y = dat
end
end

t2 = Marshal.load(Marshal.dump©)
p t2.equal?(t2.x.x)

class F
def marshal_dump
Marshal.dump([x,y])
end

def marshal_load(dat)
self.x, self.y = Marshal.load(dat)
end
end

t3 = Marshal.load(Marshal.dump©)
p t3.equal?(t3.x.x)

robert@fussel /cygdrive/c/Temp
$ ruby marsh.rb
true
true
marsh.rb:26:in marshal_dump': stack level too deep (SystemStackError) from marsh.rb:26:indump’
from marsh.rb:26:in marshal_dump' from marsh.rb:26:indump’
from marsh.rb:26:in marshal_dump' from marsh.rb:26:indump’
from marsh.rb:26:in marshal_dump' from marsh.rb:26:indump’
from marsh.rb:26:in marshal_dump' ... 15600 levels... from marsh.rb:26:indump’
from marsh.rb:26:in marshal_dump' from marsh.rb:34:indump’
from marsh.rb:34

robert@fussel /cygdrive/c/Temp
$

Kind regards

robert

On 23.05.2008 10:33, Andrew M. wrote:

def marshal_dump()
return [@version,@data]
end

If there are actually classes built on instance methods marshal_dump and
marshal_load, then Marshal would be completely ignoring those methods.

I don’t think so, at least not in 1.8.6:

robert@fussel /cygdrive/c/Temp
$ irb
irb(main):001:0> class F
irb(main):002:1> def marshal_dump
irb(main):003:2> puts “dump”
irb(main):004:2> “123”
irb(main):005:2> end
irb(main):006:1> def marshal_load(x)
irb(main):007:2> puts “load #{x}”
irb(main):008:2> end
irb(main):009:1> end
=> nil
irb(main):010:0> Marshal.load(Marshal.dump(F.new))
dump
load 123
=> #<F:0x7ff7be4c>
irb(main):011:0>

Assuming you meant “_dump” which is the instance method to overload for
customized serialization, and either “self._load” or klassname._load
which is the class method for the same, then this would be happening:

I believe this is yet another mechanism to control custom marshalling.

Array.to_s will be called in Marshal.dump because it expects a string.

I don’t think so. The mechanism is different. It would be too fragile
to depend on #to_s to return something that can be used to deserialize.

If for whatever reason you are Marshaling the results of your
marshal_dump methods – Marshal.dump(my_yyy.marshal_dump) --, then you
have neglected to properly handle object graph cycles, and shared
references.

Correct. See my other posting for a nice example. :slight_smile:

All in all I believe, if one wants to exclude some fields from
serialization (like with “transient” in Java) the best way is to
implement #marshal_dump to just return an array of the fields that need
to be serialized and deserialized and implement #marshal_load(ar)
accordingly. That way Marshal can properly handle loops in object
graphs etc.

Kind regards

robert

sayoyo Sayoyo wrote:

Hi,

I have a strange problem with the customized serialization…
I have a large object which is built by several classes and each class
has its own marshal_dump and marshal_load, like:

class YYY
attr_reader :data, :version

def initialize
@data = “different objects”
@version = 1
end

def marshal_dump()
return [@version,@data]
end

Your problems most likely are caused by you returning an array instead
of a String. I’m surprised Marshal doesn’t complain.

class Foo
def marshal_dump
Marshal.dump([@version, @data])
end
def marshal_load(data)
@version, @data = *Marshal.load(data)
end

end

This works here quite well without any random outtakes. At least I
haven’t yet hit one.

Regards
Stefan