RTranslate Gem (Open-URI) and Encoding

I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word “where” returns “dónde” instead of “dónde”

Any idea why this is happening and what I can do to fix this?

Thanks!

On Thu, Feb 18, 2010 at 7:45 PM, The C. [email protected] wrote:

I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word “where” returns “dónde” instead of “dónde”

Any idea why this is happening and what I can do to fix this?

You need to specify encoding in your ruby script. Ruby (1.8 at least, I
am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = ‘u’

There is more detail here:
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.

regards,
Richard.

Richard C. wrote:

On Thu, Feb 18, 2010 at 7:45 PM, The C. [email protected] wrote:

You need to specify encoding in your ruby script. Ruby (1.8 at least, I
am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = ‘u’

There is more detail here:
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.

I had the $KCODE variable set. It didn’t seem to do anything in this
case. I outputted the translated text to a file to see if it was a
display issue with the console and the text was still incorrect in the
file.

Any other ideas?

Thanks.

On 2010-02-18, The C. [email protected] wrote:

I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word “where” returns “dónde” instead of “dónde”

The amusing part is that the first one looks fine to me.

I suspect this means that you’re getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings…

-s

Translating the word “where” returns “dónde” instead of “dónde”

The amusing part is that the first one looks fine to me.

Indeed. The first one is properly encoded in UTF-8, the second in
ISO-8859-1.

-Jonathan N.

Seebs wrote:

On 2010-02-18, The C. [email protected] wrote:

I’m using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word “where” returns “dónde” instead of “d�nde”

The amusing part is that the first one looks fine to me.

I suspect this means that you’re getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings…

Where would I specify the encoding to fix this problem? And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. I’m guessing somewhere (other than $KCODE) I need to set it as
UTF-8.

Thanks.

Where would I specify the encoding to fix this problem? Â And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. Â I’m guessing somewhere (other than $KCODE) I need to set it as
UTF-8.

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding(“UTF-8”) or
string.force_encoding(“ISO-8859-1”)… as needed. In ruby 1.8, I
think it just works with the bits you provide it and it’s your
terminal that determines what actually gets displayed.

-Jonathan N.

On 2010-02-19, The C. [email protected] wrote:

Where would I specify the encoding to fix this problem?

There, I can’t help you. I don’t understand encodings at all.

-s

I’m using 1.8.7. Â I don’t think it’s the terminal but I’m not entirely
sure. Â I’m outputting the translation to a text file, but technically
I’m viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.

If you’re using 1.8, you can transcode between ISO-8859 and UTF-8 with
this:
http://ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html

-Jonathan N.

Jonathan N. wrote:

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding(“UTF-8”) or
string.force_encoding(“ISO-8859-1”)… as needed. In ruby 1.8, I
think it just works with the bits you provide it and it’s your
terminal that determines what actually gets displayed.

I’m using 1.8.7. I don’t think it’s the terminal but I’m not entirely
sure. I’m outputting the translation to a text file, but technically
I’m viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.

Eric C. wrote:

On Fri, Feb 19, 2010 at 1:29 PM, The C. [email protected] wrote:

Thanks.

Make sure PuTTY is set for UTF-8.

Aha! Well that fixed the problem of being able to see the correct
output in the terminal. It should greatly help the debugging process
now. I’m then taking the encoded string and transferring it with XML via
a socket connection. I’ll have to look into the transfer to see if it’s
breaking there.

Thanks for the help.

On Fri, Feb 19, 2010 at 1:29 PM, The C. [email protected] wrote:

Thanks.

Make sure PuTTY is set for UTF-8.