Failure cases for assignment of binary string to AR model

I’ve run into a strange bug that I need some help with. I have a model
with a binary column. I’m using Postgres 8.2.4 on the backend. For the
majority of cases, I’m able to store and retrieve binary content using
this setup with no problem. However, I’ve found two test cases that
fail. Consider the following test code, taken from one of my actions:

The test file I write out here contains out the binary data as

expected, so I know that @incoming_blob is correct.
f = File.open(“/tmp/incoming_blob”, “w+”)
f.write @incoming_blob
f.close

Set the value in the AR record, but don’t even save it.

@staged_extension.value = @incoming_blob

Now write out the value from the AR record, and it is truncated to

the first null byte.
f = File.open(“/tmp/from_ar_#{params[:extension_local_ref]}”, “w+”)
f.write @staged_extension.value
f.close

I’ve done the same thing from the console. It seems that the act of
assigning the binary String to the ActiveRecord model truncates it to
the first null byte. The really weird thing is that it works for other
blobs (jpgs and other binary data). At first my code was saving to the
database (of course) and I thought the bug was somewhere in the
postgres driver, but it’s not – as you can see here, there is no
database activity going on at all.

There’s nothing fancy going on with the model – I’ve stripped out all
validations on this field, and I still run into this problem.

The two test cases that fail with the code above can be found here:
http://rubycloud.com/pg/1288.1024.jpg
http://rubycloud.com/pg/base.tsz

I have to think given the simplicity of this code that there is a bug
in the ActiveRecord column assignment methods. I’m looking into it,
but any ideas/insight/help you can give would be greatly appreciated.

Thanks,
Matt

I have discovered the problem. My logic in thinking that there was no
interaction with the Postgres adapter simply because there was no
actual database interaction was flawed. Turns out that setting and
retrieving values to/from the attributes hash makes calls to type_cast
which in turn makes calls to string_to_binary and binary_to_string.

The problem lies in
ActiveRecord::ConnectionAdapters::PostgreSQLColumn. The heuristic used
to determine whether a column’s value has been previously encoded by
escape_bytea is not sufficient in all cases.

In binary_to_string, it checks whether it should unescape a value by
looking for the tell-tale sequence \nnn. However, it the two source
files that are causing so much trouble, these sequences appear in the
original data. So unescape_bytea gets called on a block of data that
was not previously escaped.

In base.tsz, the sequence \690 appears at byte offset 64,688.
In 1288.1024.jpg, the sequence \754 appears at byte offset 27,316.

It seems that we need a more reliable way to mark a data block as
previously escaped by escape_bytea. This is a somewhat tricky problem.
We could save a marker with the data and then remove it when
unescaping it, but this would mean that the persisted data would only
be usable from within Rails, or that any other code that works with
the database would have to know about this trick. Kind of ugly. Any
other ideas?

–Matt