BinData (or alternative)

Is there any documentation out there that shows the full use of this gem
(like, for example, how to actually use the “BinData::Choice” class in a
non-trivial example)? Alternatively, is there an
easier/better-documented gem out there for readably parsing binary data
in strings?

This is my minimal example. I’m trying to map terms from a wire
protocol into native Ruby types. The general structure of terms in this
wire protocol is:

    byte:131 + byte:tag + (complex structure according to second tag
    type)

As an example of such a secondary tag type, there is one of the simplest
structures: the “atom”. (It essentially maps to a Ruby symbol
conceptually.) In the wire protocol an atom is: byte:100 + word:length

  • length:string-data. An example would be the atom equivalent to the
    symbol :abcde. Wrapped in the full term definition it would be “\203d
    \000\005abcde” where \203 is the term tag, d is the atom tag, \000\005
    is the length of the atom and abcde is the actual atom value.

Recognizing the atom with BinData is a joy. It’s trivial, it’s
readable, it’s generally a lot of fun. I have dozens of such sub-terms
to write, but each one would likely literally only take me minutes (with
a few exceptions for which c.f. below). It’s the whole wrapping it up
together into the final term that’s causing problems.

Looking at the BinData classes, it seems to me that the Choice class is
perfect for what I need. Sadly BinData comes with very little
comprehensive tutorial documentation and very terse reference
documentation. One of the gaping holes in its coverage is, exactly, the
choice data type. As a result I’m left with this huge hole in my
ability to implement this wire protocol with readable code.

Below I have replicated my minimal test case. Class Atom is the one
that tears apart the atom subtype and it works like a charm. (You will
also note how easy it is to read it.) The test string beneath it is
easily parsed and returns exactly the kind of thing I want to be able to
use. BinData saves the day! (You don’t want to know what my first
round of coding this looked like!)

Class Term, on the other hand, is my attempt to add that extra 131
wrapping. It has so far resisted any attempt to make it do anything
other than generate errors about how nilClass doesn’t have some method I
know nothing about (id2class). Which makes sense, of course. I can’t
get anything but that message, however, or even weirder ones when I
start writing desperation code to figure out how it works. Reading the
source doesn’t help much – there’s just nothing there by way of
explaining how things fit together that helps me find the root of the
problem.

Now if the 131-term was the only issue facing me, I’d just hack around
it by checking the first byte of each term for 131 and then parsing the
rest based on the second byte. But it’s not that simple. (Of course.)
What I’ve removed from this example are some very complex cases. There
is, for example, the compressed payload. It has a tag, an uncompressed
length and a string of bytes that, when uncompressed, contain another
secondary term. There are also some term types that are containers for
arbitrary sequences of other terms (tuples, lists, etc.). Each of these
term types requires recursively parsing the binary string to extract
their term by tag type. This can also get arbitrarily deep. It’s not a
situation that’s amenable to trivial hacks. It needs robust binary
parsing and dynamic type building.

So my question boils down to this: first, how can I get the simple case
below working? And if I can’t is there any kind of binary
parsing/unpacking library out there that can handle this kind of thing?
(The ones I checked out on RubyForge didn’t seem able to handle
arbitrary, recursive extraction like what I need.)

    require 'rubygems'
    require 'bindata'
    require 'stringio'
    require 'zlib'

    TAG_MAP = {
      TAG_ATOM = 100          => :atom,
      TAG_TERM = 131          => :term
    }

    class Atom < BinData::Struct
      endian  :big

      uint8   :tag,   :value => TAG_ATOM,               :check_value
    => TAG_ATOM
      uint16  :len,   :value => lambda { data.length }
      string  :data,  :read_length => :len
    end

    p test_string = [TAG_ATOM].pack('C') + "\0\5" + "abcde"
    p Atom.new.read(StringIO.new(test_string))

    class Term < BinData::Struct
      @choicelist = []
      TAG_MAP.each_key do |x|
        @choicelist[x] = [:uint8, {:value => TAG_MAP[x]}]
      end

      endian  :big

      uint8   :tag,   :value => TAG_TERM,       :check_value =>
    TAG_TERM
      choice  :term,  :choices => @choicelist,  :selection => :tag
    end

    p test_string = [TAG_TERM].pack('C') + test_string
    p Term.new.read(StringIO.new(test_string))