Forum: Ruby Marshal.load does not create new instances?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ian T. (Guest)
on 2009-02-28 01:57
Marshal does not seem to instantiate given class(es) on load. Moreover,
it will absolutely work as long as the class is define but even if it
has no attributes. The following code snippet creates an array with
filled with objects:

class Data
end

data = File.open("data.bin", "rb") { |f| Marshal.load(f) }

This absolutely works fine even if the Data class used to dump (e.g.
from another program) has many instance variables. They just won't be
accessible directly (but still can be accessible using reflection).
Inspect would show something like <Data:0x2b24088 @name="Account"
@expanded=true ..>. Furthermore, it doesn't seem to use the defined
class if an IO is fed in Marshal.load, thus overriding Data._load just
won't work. Is that the expected behaviour?

This is really annoying considering that I would like to load data onto
one (among many) version of Data class. The version of the class to be
used is determined and set at run-time according to the file loaded;
this is my ultimate goal. There is obviously no cast feature in Ruby.
Even using a proxy won't cut it, if only for the fact that Marshal
doesn't instantiate loaded data.

Any suggestions?

Ian
7stud -. (Guest)
on 2009-02-28 05:32
Ian T. wrote:
> Marshal does not seem to instantiate given class(es) on load. Moreover,
> it will absolutely work as long as the class is define
>

Seems pretty standard across programming languages.

>
> This is really annoying considering that I would like to load data onto
> one (among many) version of Data class. The version of the class to be
> used is determined and set at run-time according to the file loaded;
> this is my ultimate goal.
>
> Any suggestions?
>

How about something like this:

-----------
#a program that dumps an object:

class MyData
  def greet
    puts "hello"
  end
end

d = MyData.new
File.open("def1.txt", "w") do |f|
  Marshal.dump(d, f)
end
------------

#program that loads the objects:

def1 = <<-def1
  class MyData
    def greet
      puts "hello"
    end
  end
def1

def2  = <<-def2
  class MyData
    def greet
      puts "goodbye"
    end
    def shout
      puts "HEY"
    end
  end
def2

def3 = <<-def3
  class MyData
    def greet
      puts "last class"
    end
    def cry
      "Wahhhh wahhh"
    end
  end
def3

data_classes = {
  "def1" => def1,
  "def2" => def2,
  "def3" => def3
}

print "Enter file name: "
fname = gets.chomp
defname = fname.split(".")[0]
eval(data_classes[defname])

begin
  File.open(fname) do |f|
    d = Marshal.load(f)
    d.greet
    d.shout
    d.cry
  end
rescue NoMethodError
  #do nothing
ensure
  f.close unless f.nil?
end
Ian T. (Guest)
on 2009-02-28 06:28
7stud -- wrote:

> How about something like this:

> #program that loads the objects:
>
> def1 = <<-def1
>   class MyData
>     def greet
>       puts "hello"
>     end
>   end
> def1

Your solution seems great (and it works). However, my problem with it is
the necessity to load and eval a class from a heredoc. It would be fine
as long as it is small classes but it won't cut it for lengthy and
numerous classes. I am afraid that it will make development and testing
cycle somehow harder, if only for the fact that I won't have the support
of my favourite IDE since it is treated as text.

I was hoping to have an object-oriented solution, for example, where I
could have a proxy, forwarder/delegator, or even subclass delegation.
These actually did work as long as I don't Marshal.load. Your neat trick
with heredoc and eval would be better used for smaller needs, I think.

Any more suggestion?
Pit C. (Guest)
on 2009-02-28 15:33
(Received via mailing list)
2009/2/28 Ian T. <removed_email_address@domain.invalid>:
> (...)
> Any more suggestion?

Ian, I'm not sure I understand what you want. AFAIK Marshal only works
if you have the same class definitions on both sides. Why is this a
problem for you?

Regards,
Pit
Robert K. (Guest)
on 2009-02-28 17:21
(Received via mailing list)
2009/2/28 Pit C. <removed_email_address@domain.invalid>:
> 2009/2/28 Ian T. <removed_email_address@domain.invalid>:
>> (...)
>> Any more suggestion?
>
> Ian, I'm not sure I understand what you want. AFAIK Marshal only works
> if you have the same class definitions on both sides. Why is this a
> problem for you?

I believe he wants to evolve the class and be able to load data
written with an older (or just different) version of the class.  And
now Ian hits the usual problems of schema migration.

Ian, you should be aware of one thing: class definitions are not
serialized - no programming language that I know does this.  And there
are probably good reasons (security, efficiency probably).

You can use tricks as 7stud suggested although I feel wary about this.
 I would probably choose a different solution based on the
requirements (which are not fully clear to me).  If you just need
changing sets of attributes then these options might work:

1. use OpenStruct
2. use Hash
3. change your class Data to store attributes in a single Hash only

There might be other and if you provide more of your requirements we
might come up with other solutions.

Kind regards

robert
Ian T. (Guest)
on 2009-02-28 22:52
> Ian, I'm not sure I understand what you want. AFAIK Marshal only works
> if you have the same class definitions on both sides. Why is this a
> problem for you?

This is actually how I use Marshal. It works fine if I have only one
version in one given Ruby program. The problem resides in loading
different files which may contain one version or another of the given
class definition. There are sometimes additional (or less) instance
variables and methods, different implementation of certain methods, etc.
depending on the version of the class. Mmm. collision problems,
Capitain!

My initial hope was on defining the main class in such way to delegate
to other classes (named and implemented according to its version). In my
twisted mind, I had imagined something that I could set the delegator to
a certain class before loading the data, just like any other proxy, and
then use it; or at least before using methods or accessors.

> Ian, you should be aware of one thing: class definitions are not
> serialized - no programming language that I know does this.  And there
> are probably good reasons (security, efficiency probably).

Understandably. :)

> 1. use OpenStruct
> 2. use Hash
> 3. change your class Data to store attributes in a single Hash only

Once again a good idea! Unfortunately, it is not just about data but
also about class and instance methods and their specific implementation.
Would it mean that I could mixin the instance of OStruct with my
specific version of a class (as a module) at that point?


> You can use tricks as 7stud suggested although I feel wary about this.
>  I would probably choose a different solution based on the
> requirements (which are not fully clear to me).  If you just need
> changing sets of attributes then these options might work:

I have data files generated by different softwares. These files are
generated according to a given class but the implementation (accessors,
methods, etc.) are slightly different according to the software. They
share the same name, basic functionalities and data though they have
differences according to their version. I would like to be able to load
and use them within my Ruby program, any or many of these generated
files at the same time without collision. Requirement was that I do not
have access to the original source of the softwares and I do have to
reimplement and test each version all by myself.

We should perhaps see the problem as if it was extreme: let's imagine
that we have multiple programs which have each a class Data but is
completely different (no similar instance variables nor methods, nothing
in common at all). No access to those programs and yet have to load all
the files within a single Ruby program. What one would do?

Thanks for your help, guys!

Regards,
Ian
Sean O. (Guest)
on 2009-03-01 00:37
(Received via mailing list)
On Sat, Feb 28, 2009 at 8:50 PM, Ian T. <removed_email_address@domain.invalid>
wrote:
[snip]
> We should perhaps see the problem as if it was extreme: let's imagine
> that we have multiple programs which have each a class Data but is
> completely different (no similar instance variables nor methods, nothing
> in common at all). No access to those programs and yet have to load all
> the files within a single Ruby program. What one would do?

Well, you could dynamically extend the loaded instances with modules
that add the specific required behaviour.
Something like this:

First file represents whatever created the data in the first place:

# file1
class MyData
  attr_accessor :kind
  attr_accessor :name
  def initialize(kind, name)
    @kind = kind
    @name = name
  end
end

instance = MyData.new("Greeting", "World")
data = Marshal.dump(instance)
File.open("data.dat", "wb") do |file|
  file.write(data)
end
# end of file1

Second file shows how you could load this data and dynamically decide
how it should behave as an instance:

# file2
# these modules will be used to extend the loaded instance depending
# on its @kind
module Hello
  def run
    puts "Hello #{ @name }"
  end
end

module Goodbye
  def run
    puts "Goodbye #{ @name }"
  end
end

# You need to define this if you're unmarshalling data that has been
# saved as MyData - no way round it as Marshal embeds the class name
# in the data
class MyData
end

# unmarshall data and extend depending on the @kind
data = File.read("data.dat")
instance = Marshal.load(data)
# this is shorthand for determining the nature of the data
if instance.instance_variable_defined?("@kind")
  kind = instance.instance_variable_get("@kind")
  if Object.const_defined?(kind)
    extension = Object.const_get(kind)
    instance.extend(extension)
    instance.run
  else
    puts "@kind not known: #{instance.inspect}"
  end
else
  puts "@kind not defined for: #{instance.inspect}"
end
# end of file2

I'm using @kind as shorthand to stand for something that distinguishes
between instances of your data. (BTW, you can't use Data as a class
name in Ruby - it's reserved for use with C extensions).

HTH,
Regards,
Sean
Gary W. (Guest)
on 2009-03-01 00:44
(Received via mailing list)
On Feb 28, 2009, at 3:50 PM, Ian T. wrote:
> We should perhaps see the problem as if it was extreme: let's imagine
> that we have multiple programs which have each a class Data but is
> completely different (no similar instance variables nor methods,
> nothing
> in common at all). No access to those programs and yet have to load
> all
> the files within a single Ruby program. What one would do?

You are establishing ground rules that can't be followed.

If you have two programs that want to exchange data then they've got
to have some pre-existing *shared* understanding of the structure
of the data.  You can't migrate the state of an object from one
arbitrary class to another arbitrary class without constraining the
form of that state in some way.

Ruby's marshal has a built-in assumption that the class that loads
the object state is the *same* (for some reasonable definition of
"same") as the class that dumps the object state.

It's sounds to me like you need to abstract out the state into its
own class and use Marshal to serialize/deserialize that and then
devise import/export methods for the various 'versions' of your Data
class.  Use an intermediate class to act as the adapter between
all the versions of your Data class.

Gary W.
Sean O. (Guest)
on 2009-03-01 03:00
(Received via mailing list)
Oops. That should be:

  instance = MyData.new("Hello", "World")

in the first file.
Brian C. (Guest)
on 2009-03-01 23:04
Ian T. wrote:
>> Ian, I'm not sure I understand what you want. AFAIK Marshal only works
>> if you have the same class definitions on both sides. Why is this a
>> problem for you?
>
> This is actually how I use Marshal. It works fine if I have only one
> version in one given Ruby program. The problem resides in loading
> different files which may contain one version or another of the given
> class definition. There are sometimes additional (or less) instance
> variables

Instance variables are not part of the class definition at all - even
when you're only talking about a single version of the class. Instance
variables are dynamically set within each object instance. For example:

class Foo
  def bar
    @xyz = 123
  end
end

f = Foo.new    # no instance variables set at all

g = Foo.new
g.instance_variable_set(:@baz, 999)   # only @baz is set

Given this: it makes sense that serializing or deserializing an instance
of Foo only takes into account what instance variables are set in that
particular object, making no reference to the class definition.
Robert K. (Guest)
on 2009-03-02 16:32
(Received via mailing list)
Attachment: t.rb (0 Bytes)
2009/2/28 Ian T. <removed_email_address@domain.invalid>:
> Capitain!
>
>
> differences according to their version. I would like to be able to load
>
> Thanks for your help, guys!

As Brian has hinted, you can use instance_variable_get etc. to access
variable values.  You can even make it a bit more convenient (see
attached file for examples).

Kind regards

robert
Michael F. (Guest)
on 2009-03-02 19:34
(Received via mailing list)
On Sat, Feb 28, 2009 at 8:56 AM, Ian T. <removed_email_address@domain.invalid>
wrote:
> This absolutely works fine even if the Data class used to dump (e.g.
> this is my ultimate goal. There is obviously no cast feature in Ruby.
> Even using a proxy won't cut it, if only for the fact that Marshal
> doesn't instantiate loaded data.

http://eigenclass.org/R2/writings/extprot-vs-ruby-marshal

^ manveru
Mike G. (Guest)
on 2009-03-03 05:40
Robert K. wrote:
>
> I believe he wants to evolve the class and be able to load data
> written with an older (or just different) version of the class.  And
> now Ian hits the usual problems of schema migration.
>
> Ian, you should be aware of one thing: class definitions are not
> serialized - no programming language that I know does this.

... except languages in which code and data are equivalent!

Sorry, I had to bite.  This is a great example of the power of code-data
equivalence.  If you store the definitions, things will just work.  I
see no immediate reason not to do it, other than the language not
letting you (short of awkward contrivances like heredoc-ing all your
code).

If you expect the definition to change, you can write adapters which
examine the definition (since it's data!) to detect new or incompatible
changes then adjust accordingly.

> And there are probably good reasons (security, efficiency probably).

There are a variety of reasons for both doing it and not doing it.  One
reason for not doing it is that the language you chose does not allow
you to do it.  That may or may not be a good reason.

In case there was any confusion from a previous thread, I do use ruby,
as is obvious from my previous posts.  I would only suggest that working
around the limitations of a language is not necessarily the best
approach, even though it is typically the default course of action.  In
some cases it might be better to use a language without those
limitations.
Robert K. (Guest)
on 2009-03-03 09:30
(Received via mailing list)
On 03.03.2009 04:38, Mike G. wrote:
> Sorry, I had to bite.
Ouch! ;-)

>  This is a great example of the power of code-data
> equivalence.  If you store the definitions, things will just work.

Well, *certain* things will just work.  But you'll trade this for
different issues.  For example, all of a sudden you can have different
implementations of the same class coexist.  I wouldn't say that one or
the other solution is necessarily easier.  They both do not change the
complexity of the underlying problem (evolution of code with data
artifacts belonging to different versions).  Both approaches (i.e.
storing code and not storing code) make certain things easy and other
things hard.

> If you expect the definition to change, you can write adapters which
> examine the definition (since it's data!) to detect new or incompatible
> changes then adjust accordingly.

I'd rather say you _must_ write adapters - otherwise chances are that
something will break uncontrollably.

> In case there was any confusion from a previous thread, I do use ruby,
> as is obvious from my previous posts.  I would only suggest that working
> around the limitations of a language is not necessarily the best
> approach, even though it is typically the default course of action.  In
> some cases it might be better to use a language without those
> limitations.

As I understand the particular situation a set of programs written in
Ruby was given and their output (marshaled data) needs to be worked
with.  In this case, choosing a different language does not look like a
feasible option.  But I generally agree that you should pick the right
tool for the job.

Kind regards

  robert
Rick D. (Guest)
on 2009-03-03 15:02
(Received via mailing list)
On Tue, Mar 3, 2009 at 2:29 AM, Robert K.
<removed_email_address@domain.invalid>wrote:

>>>
>>
>
> Well, *certain* things will just work.  But you'll trade this for different
> issues.  For example, all of a sudden you can have different implementations
> of the same class coexist.  I wouldn't say that one or the other solution is
> necessarily easier.  They both do not change the complexity of the
> underlying problem (evolution of code with data artifacts belonging to
> different versions).  Both approaches (i.e. storing code and not storing
> code) make certain things easy and other things hard.


Ruby has some subtleties in this area when compared to other OO
languages.

Mike G. introduced the idea that this was a problem in dealing with
schema
migration.  To me this implies dealing with layout changes to the
object.
This is a problem in most languages like Java, C++ and Smalltalk where
classes, along with whatever other language specific roles they play,
act as
a template for understanding which instance variable goes where in a
reified
instance.

This means that if you marshal an object then match it up to a class
with
the same template, you run into the danger of misinterpreting the state
of
the object.  In systems written in these languages, you might be able to
get
away with two different versions of a Class which have the same instance
layout template but vary in method implementations, or even have
slightly
different method repertoires.

Ruby falls into the class of languages where classes DON'T act as
templates,
instead instance variables are dynamically bound to each instance with a
run-time lookup used to map instance variable names to location.

So the OP's case shows that you can marshal Ruby objects and the
'schema' is
carried with each object. It's just that accessor methods don't go
along.

As it turns out, the MagLev project is trying to figure out how to deal
with
a similar problem right now.  In Gemstone Smalltalk, which is the code
base
on which MagLev is being built, classes and instances are all held in a
shared persistant store. When a process changes a class, and commits a
transaction, other processes see the change when the results of the
transaction become visible to them (i.e. when they start up, or commit
or
abort a transaction of their own).

Now, this was apparently the same model they were planning to follow for
MagLev. However, we had some discussions in the beta-testers forum about
how
this might or might not work with many Ruby programs because the Ruby
execution model builds up classes at run time from a known initial
state,
and classes change as the code executes, either through 'normal' class
method definition (both of which are execution time events in Ruby) or
through various levels of metaprogramming sophistication.

Because Ruby classes get built incrementally at run-time, the order of
execution can be important, so starting with a persisted initial set of
class definitions can be problematic at times.

So currently MagLev allows independent control over whether or not a
transaction commit persists changes to class definitions. A process
needs to
explicitly indicate that it want's to put class definition changes into
the
state to be committed before committing.

We'll see how this evolves.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
Robert K. (Guest)
on 2009-03-03 15:25
(Received via mailing list)
On 03.03.2009 14:00, Rick DeNatale wrote:
>>>> now Ian hits the usual problems of schema migration.
>>   This is a great example of the power of code-data equivalence.  If you
> Ruby has some subtleties in this area when compared to other OO languages.
Don't they all have? ;-)

> away with two different versions of a Class which have the same instance
> layout template but vary in method implementations, or even have slightly
> different method repertoires.
>
> Ruby falls into the class of languages where classes DON'T act as templates,
> instead instance variables are dynamically bound to each instance with a
> run-time lookup used to map instance variable names to location.
>
> So the OP's case shows that you can marshal Ruby objects and the 'schema' is
> carried with each object. It's just that accessor methods don't go along.

Yes, this is true and it allows to cope with at least some migrations
which might be enough for many practical purposes.  But strictly
speaking this situation is not really better than that of other
languages: while this property of Ruby allows for successful
deserialization, you can break a class's invariant (as manifested in the
implementation of methods) with this, rendering deserialized instances
completely unusable.

> this might or might not work with many Ruby programs because the Ruby
> transaction commit persists changes to class definitions. A process needs to
> explicitly indicate that it want's to put class definition changes into the
> state to be committed before committing.
>
> We'll see how this evolves.

Thank you for the abstract, Rick.  This sounds interesting.  Your
explanation is a nice demonstration of the complexity of the problem I
was talking about. :-)

Kind regards

  robert
Eric H. (Guest)
on 2009-03-04 00:22
(Received via mailing list)
On Feb 28, 2009, at 12:50, Ian T. wrote:

> etc.
> then use it; or at least before using methods or accessors.
Easy, dump and load an Array:

class MyObject
   # ...
   def marshal_dump
     [@ivar1, @ivar2, ...]
   end

   def marshal_load(data)
     @ivar1 = data.shift
     @ivar2 = data.shift
     # ...
   end
end

All versions of MyObject should store compatible ivars in compatible
positions in the Array.  For a fancier implementation of this idea,
see Gem::Specification in the rubygems source.

PS: Data is a built-in class:

$ ruby -e 'p Data'
Data
$
Ian T. (Guest)
on 2009-03-06 11:47
Robert K. wrote:
> 2009/2/28 Ian T. <removed_email_address@domain.invalid>:
>> Capitain!
>>
>>
>> differences according to their version. I would like to be able to load
>>
>> Thanks for your help, guys!
>
> As Brian has hinted, you can use instance_variable_get etc. to access
> variable values.  You can even make it a bit more convenient (see
> attached file for examples).
>
> Kind regards
>
> robert

Hello, Robert!

I was actually trying something along this line. My extended modules
would however redefine attribute writers (see example below) for every
and each instance variable, in order to be able to change value in the
loaded class instances. This would allow me to change value and dump
back the loaded data. Your example clearly and conveniently does not
require to redefine manually every ivar but it won't let me dump
anymore, reporting the error "singleton can't be dumped (TypeError)".

def myvar=(data)
  self.instance_variable_set(:@myvar, data)
end
This topic is locked and can not be replied to.