I'm using Rails 2.3.8 with Ruby 1.9.1 and I'm having a problem with serialized attributes in active record not preserving string encodings. The underlying problem is probably yaml, but I'm wondering if anyone has any good ideas on how to handle this. The app I'm working on has numerous serialized fields some of which contain deep structures of arrays and hashes. Getting back an ASCII-8Bit string (that's actually UTF-8) deep within those structures wrecks havoc later... Perhaps best illustrated by example, if I save l to a serialized attr in an active record model I'll get back l2 on reading from the database. >> l => ["English", "Türkçe", "РуÑÑкий"] >> l.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>] >> l.map(&:valid_encoding?) => [true, true, true] >> l.to_yaml => "--- \n- English\n- !binary |\n VMO8cmvDp2U=\n\n- \"\\xD0\\xA0\\xD1\\x83\\xD1\\x81\\xD1\\x81\\xD0\\xBA\\xD0\\xB8\\xD0\\xB9\"\n" >> l2 = YAML.load(l.to_yaml) => ["English", "T\xC3\xBCrk\xC3\xA7e", "РуÑÑкий"] >> l2.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>] Does anyone know how yaml decides on whether or not to store a string as binary vs. as an escaped string? Both the last two strings above are non-ascii-7 but only the first is stored as binary...
on 2010-09-02 21:48
on 2010-09-02 22:00
From a quick scan of your question, perhaps ya2yaml (http://rubyforge.org/projects/ya2yaml/) would help? 'Ya2YAML is "yet another to_yaml". It emits YAML document with complete UTF8 support (string/binary detection, "\u" escape sequences and Unicode specific line breaks).'
on 2010-09-02 22:40
Thanks ya2yaml is good suggestion. Took a look at it and it does the
right thing (and would work except I had trouble getting it to play nice
with active record etc.). I did come up with a different solution that
I'm posting here in case other people run into the same issue.
monkey patching String can force YAML to use \ escaping rather then
binary and therefore return strings in the default encoding (UTF-8)
rather then ASCII-8BIT
class String
def is_binary_data?
encoding == Encoding::ASCII_8BIT unless empty?
end
end
originally this routine uses some heuristics around which would be
shorter \ escaping of binary encoding of the string which is why only
some of the international strings I had were having problems.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.