Marshal.load does not create new instances?

backorder · February 28, 2009, 12:57am

Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define but even if it
has no attributes. The following code snippet creates an array with
filled with objects:

class Data
end

data = File.open(“data.bin”, “rb”) { |f| Marshal.load(f) }

This absolutely works fine even if the Data class used to dump (e.g.
from another program) has many instance variables. They just won’t be
accessible directly (but still can be accessible using reflection).
Inspect would show something like <Data:0x2b24088 @name=“Account”
@expanded=true …>. Furthermore, it doesn’t seem to use the defined
class if an IO is fed in Marshal.load, thus overriding Data._load just
won’t work. Is that the expected behaviour?

This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal. There is obviously no cast feature in Ruby.
Even using a proxy won’t cut it, if only for the fact that Marshal
doesn’t instantiate loaded data.

Any suggestions?

Ian

backorder · February 28, 2009, 4:32am

Ian T. wrote:

Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define

Seems pretty standard across programming languages.

This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal.

Any suggestions?

How about something like this:

#a program that dumps an object:

class MyData
def greet
puts “hello”
end
end

d = MyData.new
File.open(“def1.txt”, “w”) do |f|
Marshal.dump(d, f)
end

#program that loads the objects:

def1 = <<-def1
class MyData
def greet
puts “hello”
end
end
def1

def2 = <<-def2
class MyData
def greet
puts “goodbye”
end
def shout
puts “HEY”
end
end
def2

def3 = <<-def3
class MyData
def greet
puts “last class”
end
def cry
“Wahhhh wahhh”
end
end
def3

data_classes = {
“def1” => def1,
“def2” => def2,
“def3” => def3
}

print “Enter file name: "
fname = gets.chomp
defname = fname.split(”.")[0]
eval(data_classes[defname])

begin
File.open(fname) do |f|
d = Marshal.load(f)
d.greet
d.shout
d.cry
end
rescue NoMethodError
#do nothing
ensure
f.close unless f.nil?
end

backorder · February 28, 2009, 5:28am

7stud – wrote:

How about something like this:

#program that loads the objects:

def1 = <<-def1
class MyData
def greet
puts “hello”
end
end
def1

Your solution seems great (and it works). However, my problem with it is
the necessity to load and eval a class from a heredoc. It would be fine
as long as it is small classes but it won’t cut it for lengthy and
numerous classes. I am afraid that it will make development and testing
cycle somehow harder, if only for the fact that I won’t have the support
of my favourite IDE since it is treated as text.

I was hoping to have an object-oriented solution, for example, where I
could have a proxy, forwarder/delegator, or even subclass delegation.
These actually did work as long as I don’t Marshal.load. Your neat trick
with heredoc and eval would be better used for smaller needs, I think.

Any more suggestion?

backorder · February 28, 2009, 4:21pm

2009/2/28 Pit C. [email protected]:

2009/2/28 Ian T. [email protected]:

(…)
Any more suggestion?

Ian, I’m not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

I believe he wants to evolve the class and be able to load data
written with an older (or just different) version of the class. And
now Ian hits the usual problems of schema migration.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this. And there
are probably good reasons (security, efficiency probably).

You can use tricks as 7stud suggested although I feel wary about this.
I would probably choose a different solution based on the
requirements (which are not fully clear to me). If you just need
changing sets of attributes then these options might work:

use OpenStruct
use Hash
change your class Data to store attributes in a single Hash only

There might be other and if you provide more of your requirements we
might come up with other solutions.

Kind regards

robert

backorder · February 28, 2009, 2:33pm

2009/2/28 Ian T. [email protected]:

(…)
Any more suggestion?

Ian, I’m not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

Regards,
Pit

backorder · February 28, 2009, 9:52pm

Ian, I’m not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables and methods, different implementation of certain methods, etc.
depending on the version of the class. Mmm. collision problems,
Capitain!

My initial hope was on defining the main class in such way to delegate
to other classes (named and implemented according to its version). In my
twisted mind, I had imagined something that I could set the delegator to
a certain class before loading the data, just like any other proxy, and
then use it; or at least before using methods or accessors.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this. And there
are probably good reasons (security, efficiency probably).

Understandably.

use OpenStruct

use Hash

change your class Data to store attributes in a single Hash only

Once again a good idea! Unfortunately, it is not just about data but
also about class and instance methods and their specific implementation.
Would it mean that I could mixin the instance of OStruct with my
specific version of a class (as a module) at that point?

You can use tricks as 7stud suggested although I feel wary about this.
I would probably choose a different solution based on the
requirements (which are not fully clear to me). If you just need
changing sets of attributes then these options might work:

I have data files generated by different softwares. These files are
generated according to a given class but the implementation (accessors,
methods, etc.) are slightly different according to the software. They
share the same name, basic functionalities and data though they have
differences according to their version. I would like to be able to load
and use them within my Ruby program, any or many of these generated
files at the same time without collision. Requirement was that I do not
have access to the original source of the softwares and I do have to
reimplement and test each version all by myself.

We should perhaps see the problem as if it was extreme: let’s imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods, nothing
in common at all). No access to those programs and yet have to load all
the files within a single Ruby program. What one would do?

Thanks for your help, guys!

Regards,
Ian

backorder · February 28, 2009, 11:37pm

On Sat, Feb 28, 2009 at 8:50 PM, Ian T. [email protected]
wrote:
[snip]

We should perhaps see the problem as if it was extreme: let’s imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods, nothing
in common at all). No access to those programs and yet have to load all
the files within a single Ruby program. What one would do?

Well, you could dynamically extend the loaded instances with modules
that add the specific required behaviour.
Something like this:

First file represents whatever created the data in the first place:

file1

class MyData
attr_accessor :kind
attr_accessor :name
def initialize(kind, name)
@kind = kind
@name = name
end
end

instance = MyData.new(“Greeting”, “World”)
data = Marshal.dump(instance)
File.open(“data.dat”, “wb”) do |file|
file.write(data)
end

end of file1

Second file shows how you could load this data and dynamically decide
how it should behave as an instance:

file2

these modules will be used to extend the loaded instance depending

on its @kind

module Hello
def run
puts “Hello #{ @name }”
end
end

module Goodbye
def run
puts “Goodbye #{ @name }”
end
end

You need to define this if you’re unmarshalling data that has been

saved as MyData - no way round it as Marshal embeds the class name

in the data

class MyData
end

unmarshall data and extend depending on the @kind

data = File.read(“data.dat”)
instance = Marshal.load(data)

this is shorthand for determining the nature of the data

if instance.instance_variable_defined?(“@kind”)
kind = instance.instance_variable_get(“@kind”)
if Object.const_defined?(kind)
extension = Object.const_get(kind)
instance.extend(extension)
instance.run
else
puts “@kind not known: #{instance.inspect}”
end
else
puts “@kind not defined for: #{instance.inspect}”
end

end of file2

I’m using @kind as shorthand to stand for something that distinguishes
between instances of your data. (BTW, you can’t use Data as a class
name in Ruby - it’s reserved for use with C extensions).

HTH,
Regards,
Sean

backorder · February 28, 2009, 11:44pm

On Feb 28, 2009, at 3:50 PM, Ian T. wrote:

We should perhaps see the problem as if it was extreme: let’s imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods,
nothing
in common at all). No access to those programs and yet have to load
all
the files within a single Ruby program. What one would do?

You are establishing ground rules that can’t be followed.

If you have two programs that want to exchange data then they’ve got
to have some pre-existing shared understanding of the structure
of the data. You can’t migrate the state of an object from one
arbitrary class to another arbitrary class without constraining the
form of that state in some way.

Ruby’s marshal has a built-in assumption that the class that loads
the object state is the same (for some reasonable definition of
“same”) as the class that dumps the object state.

It’s sounds to me like you need to abstract out the state into its
own class and use Marshal to serialize/deserialize that and then
devise import/export methods for the various ‘versions’ of your Data
class. Use an intermediate class to act as the adapter between
all the versions of your Data class.

Gary W.

backorder · March 1, 2009, 10:04pm

Ian T. wrote:

Ian, I’m not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables

Instance variables are not part of the class definition at all - even
when you’re only talking about a single version of the class. Instance
variables are dynamically set within each object instance. For example:

class Foo
def bar
@xyz = 123
end
end

f = Foo.new # no instance variables set at all

g = Foo.new
g.instance_variable_set(:@baz, 999) # only @baz is set

Given this: it makes sense that serializing or deserializing an instance
of Foo only takes into account what instance variables are set in that
particular object, making no reference to the class definition.

backorder · March 2, 2009, 3:32pm

2009/2/28 Ian T. [email protected]:

Capitain!

differences according to their version. I would like to be able to load

Thanks for your help, guys!

As Brian has hinted, you can use instance_variable_get etc. to access
variable values. You can even make it a bit more convenient (see
attached file for examples).

Kind regards

robert

backorder · March 1, 2009, 2:00am

Oops. That should be:

instance = MyData.new(“Hello”, “World”)

in the first file.

backorder · March 2, 2009, 6:34pm

On Sat, Feb 28, 2009 at 8:56 AM, Ian T. [email protected]
wrote:

This absolutely works fine even if the Data class used to dump (e.g.
this is my ultimate goal. There is obviously no cast feature in Ruby.
Even using a proxy won’t cut it, if only for the fact that Marshal
doesn’t instantiate loaded data.

http://eigenclass.org/R2/writings/extprot-vs-ruby-marshal

^ manveru

backorder · March 3, 2009, 4:40am

Robert K. wrote:

I believe he wants to evolve the class and be able to load data
written with an older (or just different) version of the class. And
now Ian hits the usual problems of schema migration.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this.

… except languages in which code and data are equivalent!

Sorry, I had to bite. This is a great example of the power of code-data
equivalence. If you store the definitions, things will just work. I
see no immediate reason not to do it, other than the language not
letting you (short of awkward contrivances like heredoc-ing all your
code).

If you expect the definition to change, you can write adapters which
examine the definition (since it’s data!) to detect new or incompatible
changes then adjust accordingly.

And there are probably good reasons (security, efficiency probably).

There are a variety of reasons for both doing it and not doing it. One
reason for not doing it is that the language you chose does not allow
you to do it. That may or may not be a good reason.

In case there was any confusion from a previous thread, I do use ruby,
as is obvious from my previous posts. I would only suggest that working
around the limitations of a language is not necessarily the best
approach, even though it is typically the default course of action. In
some cases it might be better to use a language without those
limitations.

backorder · March 3, 2009, 8:30am

On 03.03.2009 04:38, Mike G. wrote:

Sorry, I had to bite.
Ouch!

This is a great example of the power of code-data
equivalence. If you store the definitions, things will just work.

Well, certain things will just work. But you’ll trade this for
different issues. For example, all of a sudden you can have different
implementations of the same class coexist. I wouldn’t say that one or
the other solution is necessarily easier. They both do not change the
complexity of the underlying problem (evolution of code with data
artifacts belonging to different versions). Both approaches (i.e.
storing code and not storing code) make certain things easy and other
things hard.

If you expect the definition to change, you can write adapters which
examine the definition (since it’s data!) to detect new or incompatible
changes then adjust accordingly.

I’d rather say you must write adapters - otherwise chances are that
something will break uncontrollably.

In case there was any confusion from a previous thread, I do use ruby,
as is obvious from my previous posts. I would only suggest that working
around the limitations of a language is not necessarily the best
approach, even though it is typically the default course of action. In
some cases it might be better to use a language without those
limitations.

As I understand the particular situation a set of programs written in
Ruby was given and their output (marshaled data) needs to be worked
with. In this case, choosing a different language does not look like a
feasible option. But I generally agree that you should pick the right
tool for the job.

Kind regards

robert

backorder · March 3, 2009, 2:25pm

On 03.03.2009 14:00, Rick DeNatale wrote:

now Ian hits the usual problems of schema migration.
This is a great example of the power of code-data equivalence. If you
Ruby has some subtleties in this area when compared to other OO languages.
Don’t they all have?

away with two different versions of a Class which have the same instance
layout template but vary in method implementations, or even have slightly
different method repertoires.

Ruby falls into the class of languages where classes DON’T act as templates,
instead instance variables are dynamically bound to each instance with a
run-time lookup used to map instance variable names to location.

So the OP’s case shows that you can marshal Ruby objects and the ‘schema’ is
carried with each object. It’s just that accessor methods don’t go along.

Yes, this is true and it allows to cope with at least some migrations
which might be enough for many practical purposes. But strictly
speaking this situation is not really better than that of other
languages: while this property of Ruby allows for successful
deserialization, you can break a class’s invariant (as manifested in the
implementation of methods) with this, rendering deserialized instances
completely unusable.

this might or might not work with many Ruby programs because the Ruby
transaction commit persists changes to class definitions. A process needs to
explicitly indicate that it want’s to put class definition changes into the
state to be committed before committing.

We’ll see how this evolves.

Thank you for the abstract, Rick. This sounds interesting. Your
explanation is a nice demonstration of the complexity of the problem I
was talking about.

Kind regards

robert

backorder · March 3, 2009, 11:22pm

On Feb 28, 2009, at 12:50, Ian T. wrote:

etc.
then use it; or at least before using methods or accessors.
Easy, dump and load an Array:

class MyObject

…

def marshal_dump
[@ivar1, @ivar2, …]
end

def marshal_load(data)
@ivar1 = data.shift
@ivar2 = data.shift
# …
end
end

All versions of MyObject should store compatible ivars in compatible
positions in the Array. For a fancier implementation of this idea,
see Gem::Specification in the rubygems source.

PS: Data is a built-in class:

$ ruby -e ‘p Data’
Data
$

backorder · March 3, 2009, 2:02pm

On Tue, Mar 3, 2009 at 2:29 AM, Robert K.
[email protected]wrote:

Well, certain things will just work. But you’ll trade this for different
issues. For example, all of a sudden you can have different implementations
of the same class coexist. I wouldn’t say that one or the other solution is
necessarily easier. They both do not change the complexity of the
underlying problem (evolution of code with data artifacts belonging to
different versions). Both approaches (i.e. storing code and not storing
code) make certain things easy and other things hard.

Ruby has some subtleties in this area when compared to other OO
languages.

Mike G. introduced the idea that this was a problem in dealing with
schema
migration. To me this implies dealing with layout changes to the
object.
This is a problem in most languages like Java, C++ and Smalltalk where
classes, along with whatever other language specific roles they play,
act as
a template for understanding which instance variable goes where in a
reified
instance.

This means that if you marshal an object then match it up to a class
with
the same template, you run into the danger of misinterpreting the state
of
the object. In systems written in these languages, you might be able to
get
away with two different versions of a Class which have the same instance
layout template but vary in method implementations, or even have
slightly
different method repertoires.

Ruby falls into the class of languages where classes DON’T act as
templates,
instead instance variables are dynamically bound to each instance with a
run-time lookup used to map instance variable names to location.

So the OP’s case shows that you can marshal Ruby objects and the
‘schema’ is
carried with each object. It’s just that accessor methods don’t go
along.

As it turns out, the MagLev project is trying to figure out how to deal
with
a similar problem right now. In Gemstone Smalltalk, which is the code
base
on which MagLev is being built, classes and instances are all held in a
shared persistant store. When a process changes a class, and commits a
transaction, other processes see the change when the results of the
transaction become visible to them (i.e. when they start up, or commit
or
abort a transaction of their own).

Now, this was apparently the same model they were planning to follow for
MagLev. However, we had some discussions in the beta-testers forum about
how
this might or might not work with many Ruby programs because the Ruby
execution model builds up classes at run time from a known initial
state,
and classes change as the code executes, either through ‘normal’ class
method definition (both of which are execution time events in Ruby) or
through various levels of metaprogramming sophistication.

Because Ruby classes get built incrementally at run-time, the order of
execution can be important, so starting with a persisted initial set of
class definitions can be problematic at times.

So currently MagLev allows independent control over whether or not a
transaction commit persists changes to class definitions. A process
needs to
explicitly indicate that it want’s to put class definition changes into
the
state to be committed before committing.

We’ll see how this evolves.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

backorder · March 6, 2009, 10:47am

Robert K. wrote:

2009/2/28 Ian T. [email protected]:

Capitain!

differences according to their version. I would like to be able to load

Thanks for your help, guys!

As Brian has hinted, you can use instance_variable_get etc. to access
variable values. You can even make it a bit more convenient (see
attached file for examples).

Kind regards

robert

Hello, Robert!

I was actually trying something along this line. My extended modules
would however redefine attribute writers (see example below) for every
and each instance variable, in order to be able to change value in the
loaded class instances. This would allow me to change value and dump
back the loaded data. Your example clearly and conveniently does not
require to redefine manually every ivar but it won’t let me dump
anymore, reporting the error “singleton can’t be dumped (TypeError)”.

def myvar=(data)
self.instance_variable_set(:@myvar, data)
end

Marshal.load does not create new instances?

d = MyData.new File.open(“def1.txt”, “w”) do |f| Marshal.dump(d, f) end

file1

end of file1

file2

these modules will be used to extend the loaded instance depending

on its @kind

You need to define this if you’re unmarshalling data that has been

saved as MyData - no way round it as Marshal embeds the class name

in the data

unmarshall data and extend depending on the @kind

this is shorthand for determining the nature of the data

end of file2

…

d = MyData.new
File.open(“def1.txt”, “w”) do |f|
Marshal.dump(d, f)
end