Psych being the default - what to do with invalid yaml files?

Hello,

syck has been removed in the latest ruby.

For me this means that I can no longer use my (invalid) yaml files.

Example:

Psych.load_file “/settings.yml”

Result:

Psych::SyntaxError: (/settings.yml): invalid leading UTF-8 octet at line
1 column 1

I am aware that the yaml file is broken in itself, because the YAML
standard demands UTF itself.

“YAML streams are encoded using the set of printable Unicode characters,
either in UTF-8 or UTF-16.”

And it seems that since ruby 2.0.0, UTF-8 is now more or less a
standard.

In Ruby 1.8.x, Encoding was not as rigidly enforced - all the strings I
used with german umlauts, saved in non UTF, worked nicely. For Ruby
1.9.x, I had to add the magic comment line, since US-ASCII used to be
the default, my german umlauts did not work by default. This was quite
annoying, as I had to go through all my .rb files to check if they work
on 1.9.x - it only gave me extra work, for zero gain for my own use
here.

Now with Ruby 2.0.0, I also can no longer use my broken yaml files. This
means I, once again, have to invest time in order to remain compatible
with Ruby, for zero net gain (for my personal use case here, I am aware
that the setup brings advantages for other people).

On Ruby 1.8.x, my broken yaml files worked perfectly well, thanks to
syck.

The proper solution would be to store all my yaml files (and all my .rb
files) in UTF-8.

I can not do so, for various reasons. I do not use UTF-8 at all myself
either and have zero need to do so. It takes too long to explain in
detail why - I did so many times before and it ALWAYS leads to the same
response, with the final answer being “Switch to UTF-8.”. Enter a loop
here …

Anyway, what I could do, is this:

Use my broken YAML files and save them, in another location, as UTF-8,
and then load those yaml files. I could write a ruby script doing so,
once I know how to “properly” save those files in UTF-8, in the easiest
way possible.

Then Psych would work and I could use all my various yaml files (a few
hundred of them, in different projects and directories) in the same
illegal encoding as they are now (which works perfectly well for me,
just psych rejects it).

Now my question is:

What would be the easiest (ideally, a pure ruby solution) way to save my
yaml files as UTF-8? If it can not be a pure ruby solution, using a
linux-only solution is fine too.

I use a few yaml files in my projects, and it is ok if, on upload, I can
have all those files in UTF-8. How do others save their files in a
specific encoding without using their editor for doing so? (Part of
a reason why I can not switch is because I use a uncommon editor, and I
would have to use another editor if I were to switch completely to
UTF-8. This would take even more time, which I lack.)

On Sun, 24 Feb 2013 23:40:38 +0100, Marc H. [email protected]
wrote:

I can not do so, for various reasons. I do not use UTF-8 at all myself
either and have zero need to do so. It takes too long to explain in
detail why - I did so many times before and it ALWAYS leads to the same
response, with the final answer being “Switch to UTF-8.”. Enter a loop
here …

Just a heads-up: Unicode is already the standard for describing text
data, and UTF-8 is the most used encoding of Unicode right now (UTF-16
and UTF-32 are often used in memory of running applications, but almost
never for storage). Not using Unicode is obsolete, and as years pass
you’re going to have more and more issues if you do so.

yaml files as UTF-8? If it can not be a pure ruby solution, using a
linux-only solution is fine too.

Well…

# coding: utf-8
binary_data = File.binread('filename.yaml')
utf8_encoded_text = 

binary_data.force_encoding(‘whatever’).encode(‘utf-8’)
File.binwrite(‘filename-utf8.yaml’, utf8_encoded_text)

This is of course just one of the ways to do this. You could also use
any reasonable text editor (I know that at least Notepad2 and Notepad++
for Windows and Sublime Text for all platforms can do this).

How do others save their files in a
specific encoding without using their editor for doing so?

Well uh, how else would you do that? Reasonable editors default to UTF-8
anyway these days. You usually either can choose the encoding during
saving, or have a menu option to transcode the text.

(Part of
a reason why I can not switch is because I use a uncommon editor, and I
would have to use another editor if I were to switch completely to
UTF-8. This would take even more time, which I lack.)

I can’t fathom what editor you are using that doesn’t allow you to
choose an encoding to save files in. Even good ol’ Windows Notepad does
this.