Ruby- puts with accents

Hello G.s!,

I’m beginning with ruby, and I’ve a problem with a console app. I was
looking since yesterday but nothing worked. Then I write to you for get
help.

Below I paste my code and the results. I running in Win7 and ruby 192

#! /usr/bin/ruby

encoding: UTF-8

require ‘iconv’
require ‘rubygems’
require “mysql”
require ‘nokogiri’
require ‘./lib/mihttp.rb’

encoding: utf-8

chcp 852 #change cmd encoding to unicode
puts 'será test ’

utf8 = “áóíúé”
puts utf8

latin1 = utf8.encode(“iso-8859-1”)
puts latin1

exit

Also, I tried with gems: iconv, magic_encoding…; add lines
Encoding.default_internal and external. But I get same results… I don’t
know what’s wrong!

C:\Work\RubysApps>ruby hello.rb
será test
├í├│├ş├║├ę
ߡÝ˙Ú

C:\Work\RubysApps>ruby hello.rb
hello.rb:1: invalid multibyte char (US-ASCII)

C:\Work\RubysApps>ruby -v
ruby 1.9.2p290 (2011-07-09) [i386-mingw32]

Thanks
M
PS: Sorry by my horrible english!

On May 18, 2012, at 11:31, “Mariano José G.” [email protected]
wrote:

encoding: utf-8

chcp 852 #change cmd encoding to unicode

This runs in a sub process, so I’m guessing it only affects that
process, not the parent process. What happens if you run this yourself
before the ruby script?

On Fri, May 18, 2012 at 7:31 PM, Mariano Jos G. [email protected]
wrote:

encoding: utf-8

chcp 852 #change cmd encoding to unicode

Codepage 852 isn’t the Unicode codepage - it’s MSDOS Latin-2 which
isn’t even ISO 8859-2 - see
Code page 852 - Wikipedia.

According to

the codepage for UTF-8 is 65001.

You should be able to set the codepage with

chcp 65001

then just output your UTF-8 strings without having to convert them as
long as you have the

encoding: utf-8

line near the top of your script.

I’m afraid I can’t test this at the moment as I don’t have access to a
Windows machine.

Regards,
Sean

You should encode the output in the same encoding as the one used in
console (the “codepage” – that info displayed by chcp). So for
example:

encoding: UTF-8

utf8 = “áóíúé”
puts utf8.encode(‘cp852’)

Obviously console codepages only contain a subset of Unicode. I have
tried to get chcp 65001 to work with Ruby once, but failed
miserably; unfortunately Windows’ console is utterly broken beyond any
repair when it comes to Unicode.

– Matma R.

Well, I tried with chcp 852 ; 850 65001, and nothing. Also, I tried
changing file encoding ISO-8859-1 and UTF8

I don’t know. But, At home pc, It’s running, there too is W7, but the
version of ruby is 193 and the chcp set 850.

Sean O’halpin wrote in post #1061359:

On Fri, May 18, 2012 at 7:31 PM, Mariano Jos G. [email protected]
wrote:

encoding: utf-8

chcp 852 #change cmd encoding to unicode

Codepage 852 isn’t the Unicode codepage - it’s MSDOS Latin-2 which
isn’t even ISO 8859-2 - see
Code page 852 - Wikipedia.

According to
Code Page Identifiers - Win32 apps | Microsoft Learn
the codepage for UTF-8 is 65001.

You should be able to set the codepage with

chcp 65001

then just output your UTF-8 strings without having to convert them as
long as you have the

encoding: utf-8

line near the top of your script.

I’m afraid I can’t test this at the moment as I don’t have access to a
Windows machine.

First, uninstall 1.9.2 and install a recent 1.9.3 (p125 or later) from
http://rubyinstaller.org/ If you’re using the 1.9 family on Windows,
purge every other version except 1.9.3p125 or higher.

Here’s what I get on Win7 32bit in a cmd.exe shell…

C:\Users\Jon\Documents\RubyDev\sandbox>chcp
Active code page: 437

*** encoding_1.rb file contents ***

encoding: UTF-8

utf8 = “Some accented text áóíúé with regular text.”
puts utf8

C:\Users\Jon\Documents\RubyDev\sandbox>pik ruby encoding_1.rb
jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot™
Client VM 1.7.0_04) [Windows 7-x86-java]

Some accented text áóíúé with regular text.

ruby 1.8.7 (2012-02-08 patchlevel 358) [i386-mingw32]

Some accented text áóíúé with regular text.

ruby 1.9.3p125 (2012-02-16) [i386-mingw32]

Some accented text áóíúé with regular text.

ruby 1.9.3p223 (2012-05-19 revision 35717) [i386-mingw32]

Some accented text áóíúé with regular text.

tcs-ruby 1.9.3p196 (2012-04-21, TCS patched 2012-04-21) [i386-mingw32]

Some accented text áóíúé with regular text.

ruby 2.0.0dev (2012-05-21 trunk 35732) [i386-mingw32]

Some accented text áóíúé with regular text.

…and without the # encoding: UTF-8 at the top of the file:

C:\Users\Jon\Documents\RubyDev\sandbox>pik ruby encoding_1.rb
jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot™
Client VM 1.7.0_04) [Windows 7-x86-java]

SyntaxError: encoding_1.rb:1: invalid multibyte char (US-ASCII)

ruby 1.8.7 (2012-02-08 patchlevel 358) [i386-mingw32]

Some accented text áóíúé with regular text.

ruby 1.9.3p125 (2012-02-16) [i386-mingw32]

encoding_1.rb:1: invalid multibyte char (US-ASCII)
encoding_1.rb:1: invalid multibyte char (US-ASCII)

ruby 1.9.3p223 (2012-05-19 revision 35717) [i386-mingw32]

encoding_1.rb:1: invalid multibyte char (US-ASCII)
encoding_1.rb:1: invalid multibyte char (US-ASCII)

tcs-ruby 1.9.3p196 (2012-04-21, TCS patched 2012-04-21) [i386-mingw32]

encoding_1.rb:1: invalid multibyte char (US-ASCII)
encoding_1.rb:1: invalid multibyte char (US-ASCII)

ruby 2.0.0dev (2012-05-21 trunk 35732) [i386-mingw32]

encoding_1.rb:1: invalid multibyte char (US-ASCII)
encoding_1.rb:1: invalid multibyte char (US-ASCII)

If you want toy with poor old cmd.exe, try using the type (like cat)
command to list out encoding_1.rb after switching different codepages.

Jon