Encoding issue


#1

I’m having a problem with output gleaned from searching ISO-8859-9 web
pages.

I thought I had rectified this by opening the pages “r:ISO…”, and by
using .force_encoding and .encode! with the lines. This seems to work,
as according to logger.info output, execution makes it all the way
through the controller AND the view code…meaning I would have thought
there is nothing left for me to do.

After the controller code has run I get:
Rendering docsearch/search
Then the view code runs and I get:
Completed in 1179ms (View: 11, DB: 1) | 200 OK
[http://localhost/docdir/search]

THEN,
Processing ApplicationController#search
which seems strange, since
Processing DocdirController#search
already happened. I had not noticed this before; what is the difference
between these two, and what is the significance of “Processing
ApplicationController#search” AFTER the view is already complete?

Anyway, that’s when I get
ArgumentError (invalid byte sequence in UTF-8):
internal:prelude:8:in synchronize' /usr/local/lib/ruby/1.9.1/webrick/httpserver.rb:111:inservice’
/usr/local/lib/ruby/1.9.1/webrick/httpserver.rb:70:in run' /usr/local/lib/ruby/1.9.1/webrick/server.rb:183:inblock in
start_thread’

So I have tried setting both internal and external encodings:
Encoding.default_external = @dir.encoding
Encoding.default_internal = @dir.encoding
In the hopes that this would mean everything – parsing, IO, etc. will
be done in ISO-8859-1, but obviously this is not the case. I presume I
cannot set the encoding for the scripts that aren’t mine, ie, the
webrick scripts that are throwing this error.

What can I do? AFAICT, all my code has executed without error, every
line dealt with is output via the logger, in the controller AND in the
view, A-ok, and then this happens? Why can I not just set everything
to one encoding for a duration? Is it webrick? I also tried using
Encoding::Converter on everything read in, to no avail…Help!

–MK


#2

Well, I solved it using Encoding::Convert. The difference between
“each” (which does not allow in place changes) and “each_with_index” had
me hung up. Ruby newbie!

I would still LOVE to know why/how iconv self-destructed tho…


#3

This works in a non-rails script:

def enconvert(text)
conv=Encoding::Converter.new(“ISO-8859-1”,“UTF-8”,
:undef => :replace, :invalid => :replace)
text.each { |ln| ln=conv.convert(ln) }
end

but in rails I still get an error, after “converting” the same page.

SURELY SOMEONE has dealt with encoding conversion before??

ps. iconv mysteriously self-destructed on my system this morning. It
worked briefly, but now

require “iconv”

produces an error (so iconv is now unusable on my system until I rebuild
ruby). Does anyone know if Encoding::Converter uses iconv under the
hood?