Tagged unions and instantiating objects


#1

I frequently work on binary data files that contain data structures
with nested tagged unions: sf2, mp3, smf, CAN, J1939. I can parse the
files ad-hoc and look at data bytes to instantiate the correct final
object but it seems there should be a better classy way where the
parsing occurs hierarchically as the inheriting classes become more
specialized.

Say a file contains a collection of shapes. As you read bytes you
discover the shapes are squares and triangles, but you don’t know that
until you’ve read some of the data that resides in the shape
superclass, so then you have to pass that data in when you instantiate
the new square object which is then used to assign to instance
variables of the superclass. It would be nicer to be able to
instantiate a generic superclass object, read its data, then based on
the values of the tags specialize the object to become a square or
triangular object and then the specialized object can read its data to
become more specific, and so on. Each child class could parse the bits
it knows about, and the object becomes more specialized as it reads
more.
In CLOS there are ways around this such as described in Peter Seibel’s
book.

Does anyone know of any Ruby techniques that work well when reading
data containing tagged unions?

Thanks! Bob


#2

On 3/27/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

Does anyone know of any Ruby techniques that work well when reading
data containing tagged unions?

Thanks! Bob

Hmm maybe I did not understand your request completely but the following
seems a reasonable approach to me

Allow an object of the superclass to be passed into the constructor of
the
subclass.

-------------------------------------9<-----------------------------
class Shape
attr :nodes, :info

 def initialize( info, nodes = nil )
       @nodes = nodes


def size …
end
end

class Rectangle
attr_accessor :width, :height
def initialize( aShape, width, height )
nodes = comp( width, height)
super “rectangle”, nodes
@width = width

end
end

now you start reading
myShape = Shape.new( :bla )
discovery is done
myShape = Rectangle.new( myShape, 42, 27 )
------------------------------->6------------------------------

but maybe my understanding of the problem is too superficial.

Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

#3

Oh, I see, so module almost serves purpose of an abstract class. Can
instance variables be assigned in self.new which are then carried into
TypeB without explicitly passing them? I think that may be done in
Lisp with-slots of the abstract class. mmap is convenient and acts as
a global variable to permit access to previously read numbers while
deep in the parser. Wouldn’t it look nicer to simply fill the slots in
the abstract class with the numbers in the header while you have them,
more like

module Type
def self.new buf
@version = parse_byte buf # save some header info, alas,
invisible to TypeA
byte = parse_byte buf
case byte
when 0
TypeA.new buf
when 1
TypeB.new buf
end
end
end

And have TypeA or TypeB inherit those slots that are already filled.
Alas, the final instance can’t see @version. Sure you can pass the
data into the instantiation or use mmap, but it seems cleaner if you
could simply bind it in the abstract header class where it naturally
belongs.

Kind regards, Bob Anderson
PS. I admire your Ruby work and writings!


#4

Passing in the parent instance to the child is a nice approach. In the
end they still seem like separate entities though. That is, the parent
is an object that the child can look at but the child hasn’t actually
become the parent. I’d like the child to become the parent including
all its previous attributes, good or otherwise.

You read a header and it says, I’m a shape at position (4,5). So you
instantiate a shape at position (4,5). Now you read further and it
says it is a square. You’d now like the shape to change its class to
become a square with its additional attributes.

The way I’ve always followed is just to wait until later in the parsing
until knowing exactly what kind of an object you want to instantiate
and than using mmap or other ways to expose previously read data. But
what if there is a lot of data in the header and what if the unions are
nested. What about a more complex example like an animal or plant
phyla or taxon where there are many levels. It seems there should be a
way to refine an object’s classification as more detail is discovered
about it, and it seems that each level of classification should only
need to know about the next level below it. The knowledge for
classification is contained within the class structure and
instantiation rather than in something external to the classes.

Kind regards,
Bob Anderson


#5

On Tue, 28 Mar 2006 removed_email_address@domain.invalid wrote:

instantiation rather than in something external to the classes.

Kind regards,
Bob Anderson

use mixins:

   harp:~ > cat.rb
   class K
     module A
       def parse buf
         buf.each do |line|
           more =
             case buf
               when /b/i
                 extend B
             end
           more ? parse(buf) : break
         end
       end
     end
     module B
       def parse buf
         buf.each do |line|
           more =
             case buf
               when /c/i
                 extend C
               else
                 false
             end
           more ? parse(buf) : break
         end
       end
     end
     module C
       def parse buf
         nil
       end
     end

     def parse buf
       buf.each do |line|
         more =
           case buf
             when /a/i
               extend A
             when /b/i
               extend B
             when /c/i
               extend C
             else
               false
           end
         more ? parse(buf) : break
       end
     end
     alias_method "initialize", "parse"
   end


   require "yaml"

   k = K.new <<-buf
     a
   buf
   y "K::A === k" => K::A === k
   y "K::B === k" => K::B === k
   y "K::C === k" => K::C === k
   puts


   k = K.new <<-buf
     a
     b
   buf
   y "K::A === k" => K::A === k
   y "K::B === k" => K::B === k
   y "K::C === k" => K::C === k
   puts

   k = K.new <<-buf
     a
     b
     c
   buf
   y "K::A === k" => K::A === k
   y "K::B === k" => K::B === k
   y "K::C === k" => K::C === k
   puts




   harp:~ > ruby a.rb
   ---
   K::A === k: true
   ---
   K::B === k: false
   ---
   K::C === k: false

   ---
   K::A === k: true
   ---
   K::B === k: true
   ---
   K::C === k: false

   ---
   K::A === k: true
   ---
   K::B === k: true
   ---
   K::C === k: true

food for thought.

-a


#6

On Tue, 28 Mar 2006 removed_email_address@domain.invalid wrote:

superclass, so then you have to pass that data in when you instantiate

Does anyone know of any Ruby techniques that work well when reading
data containing tagged unions?

Thanks! Bob

you want to use something like this pattern:

module Type
def self.new buf
byte = parse_byte buf
case byte
when 0
TypeA.new buf
when 1
TypeB.new buf
end
end

 class TypeA
   ...
   class TypeB
     ...
   end
 end

 class TypeB
   ...
 end

end

so Type.new(buf) returns an object of the correct type. each class in
the
hierachy should be a nested class (though this isn’t required).

so

obj = Type.new(buf)
p obj.class # Type::TypeA::TypeB for example

i’d highly reccomend using mmap to read/parse the data structure instead
of
reading the file - this way child classes can have access to the
entirety of
any previous context required for parsing. using mmap is just like
using a
string so you can do something like

@mmap = Mmap::new path, ‘rw’, Mmap::MAP_SHARED

byte = @mmap[0,1]

and later, in subclasses

if @mmap[12345 … 12346] == SOME_VALUE
if @mmap[0] == SOME_OTHER_VALUE

so it saves from doing any explicit io at all and gives easy context to
a
parser. of course it’s also very easy on memory and makes it totally
simply
to update specific bytes of a binary file without complex seek/set
operations.

set a value

 @mmap[0,4] = [42].pack 'N'

regards.

-a


#7

Thanks for all the suggestions. Although we can extend an object
freely and dynamically by adding singletons, delegates, mixins,
instance vaiables etc, its fundamental object.class is always as it was
when it was born with Class.new. We can still use modules or our own
designs to implement a more customized system for organizing objects
that are not as easily expressed directly. Sometimes simple languages
like c seem appealing again ;).
Bob


#8

On 3/27/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

Passing in the parent instance to the child is a nice approach. In the
end they still seem like separate entities though. That is, the parent
is an object that the child can look at but the child hasn’t actually
become the parent. I’d like the child to become the parent including
all its previous attributes, good or otherwise.

I am afraid that is not possible, remember that you have only a
reference to
an object
changing the class of an existing object does not seem possible to me,
but
maybe I am wrong, well I would be happy to be wrong BTW.

You read a header and it says, I’m a shape at position (4,5). So you

instantiate a shape at position (4,5). Now you read further and it
says it is a square. You’d now like the shape to change its class to
become a square with its additional attributes.

Maybe I am caught in the Ruby Object Model paradigm, maybe we should
forget
about objects and do our own design, seems a long shot to me though. I
noticed only now that you did not use classes but modules, why not,
interesting!

Nevertheless I still see the hurdle of an object not beeing able to
change
its class, you can, however act on an object by defining instance
variables
and methods as you read on and get more specific information.
Too bad I do not have enough time, seems quite interresting.

Cheers
Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

#9

On 28 Mar 2006, at 20:43, removed_email_address@domain.invalid wrote:

Thanks for all the suggestions. Although we can extend an object
freely and dynamically by adding singletons, delegates, mixins,
instance vaiables etc, its fundamental object.class is always as it
was
when it was born with Class.new.

Unless you take a look at delegate :slight_smile:

We can still use modules or our own
designs to implement a more customized system for organizing objects
that are not as easily expressed directly. Sometimes simple languages
like c seem appealing again ;).

ponders

No WAY! :slight_smile:


#10

I don’t know delegate well but it looks like the original object and
the delegated object are separate entities, i.e. instance variables and
methods are still distinct.

I noticed that if the klass pointer could be changed then the object
would really become of a different class, but I don’t know enough about
how the methods and instance variables are indexed and if the child
inheritor would be able to access them safely.

Lisp CLOS objects can change class easily, I just hesitated to mention
lisp, so I chose c instead :slight_smile: