Forum: Rails I18n converting UTF-8 to entities like 剛

389102eb1ab4abf162c1cd1f7b147d5a?d=identicon&s=25 Jian Lin (jianlin)
on 2009-05-09 14:03
I was trying to convert UTF-8 content into a series of entities like
剛 so that whatever the page encoding is, the characters would
show...

so I used something like this:
<%
begin
  t = ''
  s = Iconv.conv("UTF-32", "UTF-8", some_utf8_string)

  s.scan(/(.)(.)(.)(.)/) do |b1, b2, b3, b4|
    t +=   ("&#x" + "%02X" % b3.ord) + ("%02X" % b4.ord) + ";"
  end
rescue => details
  t = "exception " + details
end
%>

<%= t %>

but some characters get converted, and some don't.  Is it true that
(.)(.)(.)(.) will not necessarily match 4 bytes at a time?

At first, I was going to use

s = Iconv.conv("UTF-16", "UTF-8", some_utf8_string)

but then i found that utf-16 is also variable length... so I used UTF-32
instead which is fixed length.  The UTF-8 string I have is just the
Basic Plane... so should be all in the 0x0000 to 0xFFFF range in
unicode.
389102eb1ab4abf162c1cd1f7b147d5a?d=identicon&s=25 Jian Lin (jianlin)
on 2009-05-09 14:16
by the way, this works:

but i am sure there are more elegant solutions.

<%
begin
  t = ''
  s = Iconv.conv("UTF-32", "UTF-8", some_utf8_string)

  (s.length / 4).times do |i|
    b3 = s[i*4 + 2]
    b4 = s[i*4 + 3]
    t += ("&#x" + "%02X" % b3) + ("%02X" % b4) + ";"
  end
rescue => details
  t = "exception " + details
end
%>

<%= t %>
This topic is locked and can not be replied to.