On Mon, Jan 23, 2012 at 6:22 PM, Peter V.
[email protected]wrote:
3.0.3), but it raises the following exception only on production:
100%[======================================>] 50,089 --.-K/s in
require ‘iconv’
Some relevant links:
http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
http://blog.grayproductions.net/articles/ruby_19s_string
http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html#method-i-convert
The code that seems to function fairly well is:
$ cat convert.rb
File.open(‘gistfile1.txt’) do |f|
f.readlines.each do |line|
puts “###############################################”
puts line.valid_encoding? # always true
ec = Encoding::Converter.new("utf-8", "ISO-8859-1", :undef =>
:replace)
ec.replacement = “UNDEFINED”
puts ec.convert(line)
end
end
$ ruby convert.rb > result
This code converts your entire document (line by line)
without throwing exceptions.
The source text seems to be always valid UTF-8.
But … some UTF-8 constructs seem to be incompatible to translate
to ISO-8859-1, e.g. the long dash in this piece of text:
“… institucional do Grupo Zaffari alis …”
It is found back in the output with the code “UNDEFINED” that I defined.
Without the :undef, that produced:
convert.rb:9:in `convert’: U+2013 from UTF-8 to ISO-8859-1
(Encoding::UndefinedConversionError)
That seems quite plausible since UTF-8 has many different code points,
but ISO-8859-1 is limited to 1 byte if I understand correctly.
I hope this can put you on the right track,
Peter