DANGER ! Ruby-Newbie ahead: How to access binary files

Hi,

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
“procedure-based” thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

Thank you very much for any help in advance!
Dont worry, use ruby!
mcc

On 16/01/06, Meino Christian C. [email protected] wrote:

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
“procedure-based” thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

If you’re using Windows, make sure you open the file in binary mode:

File.open(filename, “rb”) …

Otherwise, look up Array#unpack. There are examples of this in the
ImageInfo library that is on the RAA; I have a custom copy of it in
PDF::Writer.

-austin

Meino Christian C. wrote:

Hi,

There is a file on my HD, which was written by a C program. The C
program wrotes the contents of an array of structures (each array
element was made from the same structure) to that file.

Since accessing that file looks like a very low level and
“procedure-based” thing to me I would be very interested how this job
can be done in a most ruby-like, objectoriented way.

You need to know the format of the structure and its size.

You can then use IO#read to read bytes from the file into a string and
String#unpack to extract the individual fields from the structure.

Of course, all of this should be encapsulated into a class :slight_smile:

[email protected] wrote:

    def bar
      @mmap[@offset + Integer::SIZEOF, Float::SIZEOF].unpack("f").first
    end
    def bar= f
      @mmap[@offset + Integer::SIZEOF, Float::SIZEOF] =

[Float(f)].pack(“f”)
end

Doesn’t this arithmetic assume that the C compiler is packing the fields
of the struct? What if fields are aligned on 8 byte boundaries, for
instance? I vaguely remember having some issues like this when porting
from x86 to sparc. I guess you could add attribute((packed)) to
the struct to be sure.

On Tue, 17 Jan 2006, Joel VanderWerf wrote:

Doesn’t this arithmetic assume that the C compiler is packing the fields
of the struct? What if fields are aligned on 8 byte boundaries, for
instance? I vaguely remember having some issues like this when porting
from x86 to sparc. I guess you could add attribute((packed)) to
the struct to be sure.

absolutely. i figured it was beyond the scope of the post to get into
that -
but really the file format would need to export the shape of the struct
in
some sort of header. to do this one would need to crawl the struct with
a
‘void *’ and compute offsets from the address of the struct.

of course, this would about the point where one should pull out xdr or
some
such. in practice, however, one often needs to read binary data written
by a
program beyond one’s control and the unpack approach will work most of
the
time - wouldn’t launch rockets with it though!

cheers.

-a

On Tue, 17 Jan 2006, Meino Christian C. wrote:

There is a file on my HD, which was written by a C program. The C program
wrotes the contents of an array of structures (each array element was made
from the same structure) to that file.

Since accessing that file looks like a very low level and “procedure-based”
thing to me I would be very interested how this job can be done in a most
ruby-like, objectoriented way.

this is a perfect use case to abstract the error prone method of
reading/seeking/writing that one would typically do with binary data. i
use
mmap alot for these types of tasks at work, here is a little (silly)
example:

first we build a c program to output an array of struct. note that we
output
the sizeof(struct) as the first part of the file - this is because we
can’t
know how the compiler will pad structs so we make sure the correct size
is
encoded into the file:

 harp:~ > cat a.c
 #include <stdlib.h>
 #include <stdio.h>

 struct foobar { int foo; float bar; };

 main ()
 {
   struct foobar a[] = { {40, 40.0}, {2, 2.0} };
   int size = sizeof(struct foobar);
   fwrite (&size, sizeof(int), 1, stdout);
   fwrite (&a, sizeof(a), 1, stdout);
 }

 harp:~ > gcc a.c

 harp:~ > a.out > a

next we write a ruby class to access the data. the access will be via
mmap, so
any changes we make to the data can be tranparently written to disk with
no
explicit io on our part - we simply use the objects as normal:

 harp:~ > cat a.rb
 #! /usr/bin/env ruby
 require "mmap"  # ftp://moulon.inra.fr/pub/ruby/

 class Integer
   SIZEOF = [42].pack("i").size
 end
 class Float
   SIZEOF = [42.0].pack("f").size
 end
 module Foobar
   class Struct
     def initialize mmap, offset
       @mmap, @offset = mmap, offset
     end
     def foo
       @mmap[@offset, Integer::SIZEOF].unpack("i").first
     end
     def foo= i
       @mmap[@offset, Integer::SIZEOF] = [Integer(i)].pack("i")
     end
     def bar
       @mmap[@offset + Integer::SIZEOF, 

Float::SIZEOF].unpack(“f”).first
end
def bar= f
@mmap[@offset + Integer::SIZEOF, Float::SIZEOF] =
[Float(f)].pack(“f”)
end
def inspect
{ “foo” => foo, “bar” => bar }.inspect
end
end
class List < ::Array
def initialize mmap
@mmap = mmap
@sizeof = mmap[0, Integer::SIZEOF].unpack(“i”).first
offset = Integer::SIZEOF
while((offset + @sizeof) <= mmap.size)
struct = Struct::new @mmap, offset
self << struct
offset += @sizeof
end
end
end
class File
attr “path”
attr “list”
attr “mmap”
def initialize path
@path = path
open(@path, “r+”){|f| @mmap = Mmap::new f, “rw”,
Mmap::MAP_SHARED}
@list = List::new @mmap
end
def self::new *a, &b
ff = super
mmap = ff.mmap
::ObjectSpace::define_finalizer(ff){ mmap.msync; mmap.munmap
}
ff
end
end
end

 ff = Foobar::File::new ARGV.shift
 fl = ff.list

 p fl

 fl.each{|f| f.foo = 42 and f.bar = 42.0}  # automatically written!

the first time we run the progam we see the data the c program wrote:

 harp:~ > a.rb a
 [{"foo"=>40, "bar"=>40.0}, {"foo"=>2, "bar"=>2.0}]

but next time we see the data automatically written by the ruby program:

 harp:~ > a.rb a
 [{"foo"=>42, "bar"=>42.0}, {"foo"=>42, "bar"=>42.0}]

this is just a silly example, but it shows how objectification of
something
like this might be done in a way that really makes working with the
actual data
easier.

kind regards.

-a