UnicodeUtils 1.4.0 - case conversion, normalization and more

aris · September 30, 2012, 11:23pm

New in this release:

Updated to Unicode 6.2.0.
UnicodeUtils.debug accepts single Integer code point
New method UnicodeUtils.white_space_char?

Usage

Ruby 1.9.1 or higher is required.

$ gem install unicode_utils

require “unicode_utils/display_width”
UnicodeUtils.display_width(“にっき”) # => 6

$ irb -r unicode_utils/u

irb(main):001:0> U.debug 0x20ba
Char | Ordinal | Sid | General Category | UTF-8
------±--------±------------------±-----------------±---------
“₺” | 20BA | TURKISH LIRA SIGN | Currency_Symbol | E2 82 BA

irb(main):003:0> U.casefold(“Straße”) == U.casefold(“STRASSE”)
=> true

irb(main):004:0> U.titlecase “willkommen österreich”
=> “Willkommen Österreich”

irb(main):005:0> U.nfkc “ﬁnland”
=> “finland”

Documentation & Source

http://unicode-utils.rubyforge.org
GitHub - lang/unicode_utils: Unicode algorithms for Ruby 1.9

Issues

It should work on all Ruby 1.9.1 implementations or higher
independently of operating system. If not, please report
it on Issues · lang/unicode_utils · GitHub

All tests pass with jruby-1.7.0.RC1. Not all tests pass with
MRI 1.9.3p194 due to unexptected behaviour of String#<< with
UTF-16 strings. As long as you use only UTF-8, there’s no problem.
(Bug #7090: UTF-16LE String#<< append 0x0 for certain codepoints - Ruby master - Ruby Issue Tracking System).