Hello, syck has been removed in the latest ruby. For me this means that I can no longer use my (invalid) yaml files. Example: > Psych.load_file "/settings.yml" Result: Psych::SyntaxError: (/settings.yml): invalid leading UTF-8 octet at line 1 column 1 I am aware that the yaml file is broken in itself, because the YAML standard demands UTF itself. http://en.wikipedia.org/wiki/YAML "YAML streams are encoded using the set of printable Unicode characters, either in UTF-8 or UTF-16." And it seems that since ruby 2.0.0, UTF-8 is now more or less a standard. In Ruby 1.8.x, Encoding was not as rigidly enforced - all the strings I used with german umlauts, saved in non UTF, worked nicely. For Ruby 1.9.x, I had to add the magic comment line, since US-ASCII used to be the default, my german umlauts did not work by default. This was quite annoying, as I had to go through all my .rb files to check if they work on 1.9.x - it only gave me extra work, for zero gain for my own use here. Now with Ruby 2.0.0, I also can no longer use my broken yaml files. This means I, once again, have to invest time in order to remain compatible with Ruby, for zero net gain (for my personal use case here, I am aware that the setup brings advantages for other people). On Ruby 1.8.x, my broken yaml files worked perfectly well, thanks to syck. The proper solution would be to store all my yaml files (and all my .rb files) in UTF-8. I can not do so, for various reasons. I do not use UTF-8 at all myself either and have zero need to do so. It takes too long to explain in detail why - I did so many times before and it ALWAYS leads to the same response, with the final answer being "Switch to UTF-8.". Enter a loop here ... Anyway, what I could do, is this: Use my broken YAML files and save them, in another location, as UTF-8, and then load those yaml files. I could write a ruby script doing so, once I know how to "properly" save those files in UTF-8, in the easiest way possible. Then Psych would work and I could use all my various yaml files (a few hundred of them, in different projects and directories) in the same illegal encoding as they are now (which works perfectly well for me, just psych rejects it). Now my question is: What would be the easiest (ideally, a pure ruby solution) way to save my yaml files as UTF-8? If it can not be a pure ruby solution, using a linux-only solution is fine too. I use a few yaml files in my projects, and it is ok if, on upload, I can have all those files in UTF-8. How do others save their files in a specific encoding __without__ using their editor for doing so? (Part of a reason why I can not switch is because I use a uncommon editor, and I would have to use another editor if I were to switch completely to UTF-8. This would take even more time, which I lack.)
on 2013-02-24 23:40
on 2013-02-25 19:48
On Sun, 24 Feb 2013 23:40:38 +0100, Marc Heiler <email@example.com> wrote: > I can not do so, for various reasons. I do not use UTF-8 at all myself > either and have zero need to do so. It takes too long to explain in > detail why - I did so many times before and it ALWAYS leads to the same > response, with the final answer being "Switch to UTF-8.". Enter a loop > here ... Just a heads-up: Unicode is already *the* standard for describing text data, and UTF-8 is the most used encoding of Unicode right now (UTF-16 and UTF-32 are often used in memory of running applications, but almost never for storage). Not using Unicode is obsolete, and as years pass you're going to have more and more issues if you do so. > yaml files as UTF-8? If it can not be a pure ruby solution, using a > linux-only solution is fine too. Well... # coding: utf-8 binary_data = File.binread('filename.yaml') utf8_encoded_text = binary_data.force_encoding('whatever').encode('utf-8') File.binwrite('filename-utf8.yaml', utf8_encoded_text) This is of course just one of the ways to do this. You could also use any reasonable text editor (I know that at least Notepad2 and Notepad++ for Windows and Sublime Text for all platforms can do this). > How do others save their files in a > specific encoding __without__ using their editor for doing so? Well uh, how else would you do that? Reasonable editors default to UTF-8 anyway these days. You usually either can choose the encoding during saving, or have a menu option to transcode the text. > (Part of > a reason why I can not switch is because I use a uncommon editor, and I > would have to use another editor if I were to switch completely to > UTF-8. This would take even more time, which I lack.) I can't fathom what editor you are using that doesn't allow you to choose an encoding to save files in. Even good ol' Windows Notepad does this.