Codepage 850 character set

dsmorey · November 25, 2005, 10:10pm

Can anybody tell me how to read this information with Ruby? I
received a file encoded with the Codepage 850 character set. Vim
opens the file and reads the characters like this…

<90>lodie

The <90> should be a codepage 850 character \220. I am however able
to open this file in Emacs and the character actually shows \220, and
actually equals only one character. Here’s what it looks like in
Emacs…

\220lodie

So, I’m almost positive the client is sending me the file correctly,
and it’s correctly encoded in Codepage 850 character set. However,
how do I use Ruby to process this file? When I loop through the
characters of the word in question, Ruby is counting the character as
2 characters, even though Emacs shows it as one when I cursor over it.

When I open the file and read the above string in ruby using this
code…

p.each_byte do |c|
print c, ’ ’
end

I get this output…

194 144 108 111 100 105 101

You’ll notice there are 7 characters (bytes) there, however, Emacs
only comes out with 6, the \220 being one of them. And I’m close to
certain that the 194 and 144 should be the first character, but not
sure how to get it to come out that way.

What I need to do is be able to read this information in (using Ruby),
and insert into a mysql table, but so far I’m not even sure how to
read this information in Ruby.

Thanks for any help you can offer.