Forum: Ruby-core Consider the ICU Library for Improving and Expanding Unicode Support

F24ff61beb80aa5f13371aa22a35619c?d=identicon&s=25 mame (Yusuke Endoh) (Guest)
on 2012-11-20 12:51
(Received via mailing list)
Issue #2034 has been updated by mame (Yusuke Endoh).

Target version set to next minor

Feature #2034: Consider the ICU Library for Improving and Expanding
Unicode Support

Author: runpaint (Run Paint Run Run)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Target version: next minor

 Has consideration been recently given to employing the ICU library
( in Ruby? The bindings are in C and the
library mature. My ignorance of the Ruby source not withstanding, this
would allow existing String methods, among others, to support non-ASCII
characters in an incremental manner.

 For a trivial example, consider String#to_i. It currently understands
only ASCII characters which represent digits. ICU provides a
u_charDigitValue(code_point) function which returns the integer
corresponding to the given Unicode codepoint. Were String#to_i to use
this, it would work with non-ASCII counting systems, thus removing at
least one of the "as long as it's ASCII" caveats currently associated
with String methods.

 More generally, if it's desirable for String methods to properly
support Unicode, and if the principle barrier is the difficulty of the
implementation, then might there be at least a partial solution in
marrying Ruby with ICU?

 If ICU is unfeasible, I'd appreciate understanding why. There are
multiple approaches to what I term the second phase of Unicode support
in Ruby, and it will be easier to choose between them if I understand
the constraints. :-) (Of course, if a direction has already been
determined, and work on it is underway, I will gladly bow out ;-)).
This topic is locked and can not be replied to.