Short question about encoding

Hello everybody at the Ruby Forums!

After getting more into Ruby I stumbled across a little problem: how
would I go about making a string with a “square root” sign (√) inside
it, so that it gets encoded in UTF-8 and so that I can later save it in
a file?

Also, I read that Ruby does have encoding-related issues, so, if that’s
the case, I can probably live without the “square root” sign.

puts “√”.encode(“UTF-8”) # converts it into a “v”
puts “√” # -//-
puts “√”.force_encoding(“UTF-8”) # -//-

Any ideas, please?

On Nov 10, 2010, at 5:34 PM, Ammar A. wrote:

One possibility is using the hex codes for that code point, like:

“\xE2\x88\x9A”

There is also a unicode escape, \u. For more information, take a
look at:

http://ruby.runpaint.org/strings#escapes-summary

HTH,
Ammar

It seems to be fine with Ruby 1.9.2-p0.

irb> sqrt = “√”
=> “√”
irb> sqrt.encoding
=> #Encoding:UTF-8
irb> sqrt.bytes
=> #<Enumerator: “√”:bytes>
irb> sqrt.bytes.to_a
=> [226, 136, 154]
irb> sqrt.chars.to_a
=> [“√”]
irb> puts sqrt

=> nil
irb> puts sqrt.encode(“US-ASCII”)
Encoding::UndefinedConversionError: U+221A from UTF-8 to US-ASCII
from (irb):10:in encode' from (irb):10 from /Users/rab/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in
irb> puts sqrt.force_encoding(“US-ASCII”)

=> nil
irb> sqrt.force_encoding(“US-ASCII”).chars.to_a
=> [“\xE2”, “\x88”, “\x9A”]

Are you perhaps sending the output to a device that does not
understand UTF-8?

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

On Thu, Nov 11, 2010 at 12:01 AM, Gabriel L.
[email protected] wrote:

puts “√”.encode(“UTF-8”) # converts it into a “v”
puts “√” # -//-
puts “√”.force_encoding(“UTF-8”) # -//-

Any ideas, please?

One possibility is using the hex codes for that code point, like:

“\xE2\x88\x9A”

There is also a unicode escape, \u. For more information, take a look
at:

http://ruby.runpaint.org/strings#escapes-summary

HTH,
Ammar

Are you perhaps sending the output to a device that does not understand UTF-8?

I’m guessing that’s the issue.

Although, I’m on Windows 7 and in the Command Prompt I can copy/paste
the “√” character and it appears fine in the Courier New font and
everything, when I run Ruby (Ruby 1.9.2p0) or irb I can’t copy/paste
that character anymore. When I save this in a file and run it:

s = “√”
p s
puts s

it outputs:

“\u221A”
√

But then again when I just do:

File.open(“test.txt”, “w”){|x|x << “√”}

and run it, it makes the test.txt file and saves it without any problems
and with the actual square root character in the file.

Any idea what I’m doing wrong or why it won’t appear in the console?

EDIT: Forgot to say ‘thanks’ for the replies :stuck_out_tongue:
Sorry about that, and: Thanks!

I tried the chcp thing too :confused: :

system “chcp 65001”
s = “√”
p s
puts s

outputs:

Active code page: 65001
“ΓêÜ”
√

I’m still doing something wrong :confused: Any more ideas?

Gabriel L. [email protected] wrote:

Any idea what I’m doing wrong or why it won’t appear in the console?

Because your console doesn’t use UTF-8. Windows consoles are 8-bit, and
the 8-bit encoding they use it determined by your code page. UTF-8 is
code
page 65001, but its support is sketchy. You can discover your code page
by
typing “chcp”.

On Sat, Nov 13, 2010 at 11:52 AM, Gabriel L.
[email protected] wrote:

“ΓêÜ”
√

I’m still doing something wrong :confused: Any more ideas?

If you have Vista or Win 7 installed, try your script in PowerShell.
Otherwise, install PowerShell, and try your script.

PowerShell is a .NET-based (almost) drop-in replacement for cmd.exe,
and, AFAIK, fully Unicode-aware.


Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Yup I’m on Windows 7.

About PowerShell: I tried the classic PowerShell.exe which printed out
the same weird characters so I tried “type .\test.rb” and that gave an
error so I Googled about PowerShell and Unicode and read that the
PowerShell ISE supports Unicode SO I tried that one: “type .\test.rb”
works fine and displays the character well, but “ruby .\test.rb”
(1.9.2p0) prints the weird characters again.

Any more suggestions, please?

Powershell ISE log:


PS C:\Users\Ye Olde Poopsmith\Desktop> ruby -v
ruby 1.9.2p0 (2010-08-18) [i386-mingw32]


PS C:\Users\Ye Olde Poopsmith\Desktop> ruby .\test.rb
“\u221A”
#Encoding:UTF-8
√


PS C:\Users\Ye Olde Poopsmith\Desktop> type .\test.rb
#system “chcp 65001”
s = “√”
p s
p s.encoding
puts s.to_s