This is probably obvious in the docs and I’m just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?
Content preview: Hi, On 30.04.2011 06:12, Terry M. wrote: > This
is probably
obvious in the docs and I’m just missing it, but here > goes: So, I
see there
is str.each_codepoint, which I want to use in a > function to
convert Unicode
Strings to a list of Unicode code points. > But what can I do if I
have a
list of Unicode code points and want to > convert them back into a
String?
[…]
Content analysis details: (-2.9 points, 5.0 required)
pts rule name description
-1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
X-Cloudmark-Analysis: v=1.1
cv=HQ3F56nxkum+cgCiDL7AXQpbvw7DWrWCBJRnYYnM0Zc= c=1 sm=0
a=aofHTkXiRO8A:10 a=F4rxgqsZPjUA:10 a=IkcTkHD0fZMA:10
a=eSU-C1wW4WoJ4zxOtLcA:9 a=QEXdDO2ut3YA:10
a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk
Lines: 18
List-Id: ruby-talk.ruby-lang.org
List-Software: fml [fml 4.0.3 release (20011202/4.0.3)]
List-Post: mailto:[email protected]
List-Owner: mailto:[email protected]
List-Help: mailto:[email protected]?body=help
List-Unsubscribe: mailto:[email protected]?body=unsubscribe
Received-SPF: none (Address does not pass the Sender Policy Framework)
SPF=FROM;
[email protected];
remoteip=::ffff:221.186.184.68;
remotehost=carbon.ruby-lang.org;
helo=carbon.ruby-lang.org;
receiver=eq4.andreas-s.net;
Hi,
On 30.04.2011 06:12, Terry M. wrote:
This is probably obvious in the docs and I’m just missing it, but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?
I think you can use Array#pack for that:
$ irb
ruby-1.9.2-p180 :001 > “f뀀oöbß”.each_codepoint.to_a
=> [102, 45056, 111, 246, 98, 223]
ruby-1.9.2-p180 :002 > “f뀀oöbß”.each_codepoint.to_a.pack(“U*”)
=> “f뀀oöbß”
cheers
Terry M. wrote in post #995906:
This is probably obvious in the docs and I’m just missing it,
You will never learn ruby unicode by reading the docs. Head over to
James Edward G. II’s website for some lessons:
http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings
but here
goes: So, I see there is str.each_codepoint, which I want to use in a
function to convert Unicode Strings to a list of Unicode code points.
But what can I do if I have a list of Unicode code points and want to
convert them back into a String?
#encoding: UTF-8
#That comment tells ruby to treat string literals in my
#source code, like the one below, as utf-8 encoded.
str = “\xE2\x82\xAC\xE2\x82\xAC”
codes = str.each_codepoint.to_a
p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join
–output:–
[8364, 8364]
€€
(You should see two euro symbols as the last line of output.)
I don’t know where you are getting your string from, but you can always
do this:
str = “\xE2\x82\xAC\xE2\x82\xAC”
puts str.encoding
str.force_encoding(“UTF-8”)
puts str.encoding
codes = str.each_codepoint.to_a
p codes
puts codes.map {|code| code.chr(Encoding::UTF_8) }.join
–output:–
ASCII-8BIT
UTF-8
[8364, 8364]
€€
(You should see two euro symbols as the last line of output.)
7stud – wrote in post #996022:
Terry M. wrote in post #995906:
This is probably obvious in the docs and I’m just missing it,
You will never learn ruby unicode by reading the docs. Head over to
James Edward G. II’s website for some lessons:http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings
Someone else blogged in great detail about all the intricacies of ruby
unicode and its problems, but I can’t find the link now.
Maybe each_char() will work for you? Take a look at the following code.
str = “\xE2\x82\xAC\xE2\x82\xAC”
puts str.encoding
str.force_encoding(“UTF-8”)
puts str.encoding
chars = str.each_char.to_a
p chars
puts chars[0].encoding
puts chars.join
–output:–
ASCII-8BIT
UTF-8
["\u20AC", “\u20AC”]
UTF-8
€€
(You should see two euro symbols as the last line of output.)
The output implies that a string with unicode escapes is given a UTF-8
encoding by default. And that seems to be the case:
str = “\u20AC\u20AC”
puts str.encoding
–output:–
UTF-8