Psych dumping in binary

I’ve got an app that I use ascii-8bit encoding as the default to
avoid throwing errors when checking against regexps, etc.

Unfortunately, this causes Psych to dump out my data structures
in binary format.

Would anyone be interested in a one-line patch to force dumping
in non-binary format?


— /opt/local/lib/ruby/1.9.1/psych/visitors/yaml_tree.rb.orig
2012-05-02 16:49:41.246503805 -0400
+++ /opt/local/lib/ruby/1.9.1/psych/visitors/yaml_tree.rb 2012-05-02
16:35:21.586503943 -0400
@@ -230,7 +230,7 @@
quote = false
style = Nodes::Scalar::ANY

  •    if binary?(o)
    
  •    if binary?(o) && ! @options[:nobinary]
         str   = [o].pack('m').chomp
         tag   = '!binary' # FIXME: change to below when syck is 
    

removed
#tag = ‘tag:yaml.org,2002:binary’

On Wed, May 2, 2012 at 1:53 PM, Jim H. [email protected] wrote:

I’ve got an app that I use ascii-8bit encoding as the default to
avoid throwing errors when checking against regexps, etc.

Unfortunately, this causes Psych to dump out my data structures
in binary format.

If you can, try US-ASCII encoding for 7-bit clean ASCII. Psych will
dump that as you expect.

On 05/02/2012 05:10 PM, Jeremy K. wrote:

If you can, try US-ASCII encoding for 7-bit clean ASCII. Psych will
dump that as you expect.

Well, I’d rather avoid this when reading and parsing syslog
lines:

ruby -Eus-ascii -e ‘x = “foo”; x.force_encoding(“US-ASCII”); puts
x.encoding; x += “\xf0\xff”; x.force_encoding(“US-ASCII”); puts
x.match(/foo/); puts x’ | m
-e:1:in match': invalid byte sequence in US-ASCII (ArgumentError) from -e:1:inmatch’
from -e:1:in `’
US-ASCII

You never know what’ll be in there, and I’d rather not have to run
force_encoding on every processed line.

If there’s a better way to handle strings with naughty characters
I’d be grateful for pointers.

On Thu, May 03, 2012 at 06:41:57AM +0900, Jim H. wrote:

      from -e:1:in `match'
      from -e:1:in `<main>'

US-ASCII

You never know what’ll be in there, and I’d rather not have to run
force_encoding on every processed line.

If there’s a better way to handle strings with naughty characters
I’d be grateful for pointers.

For now, you can use String#ascii_only?

def tag(string)
  string.force_encoding("US-ASCII") if string.ascii_only?
end

x = "foo"
tag(x)
puts x.encoding
x += "\xf0\xff"
tag(x)
puts x.match(/foo/)
puts x

I will push this up to Psych so that ascii only ASCII-8BIT strings are
dumped as UTF-8.