Adding i18n support to Rails' transliterator

Hi all,

The other day I sent a patch to Rails to improve
ActiveSupport::Inflector’s support for transliterating UTF-8 Latin
characters to ASCII.

In the discussion of the patch, I mentioned the idea of making
Inflector.transliterate use Rails’ i18n facilities, and Jeremy K.
and Yaroslav Markin encouraged me to pursue the idea.

http://github.com/rails/rails/commit/dceef0828a23e8298dd9a9aab1a33c49e84f17d6#activesupport/lib/active_support/inflector/transliterate.rb-P50

I’ve been working on support for this and would like to ask for some
feedback before sending a patch to Rails. My work can be found at:

http://github.com/norman/rails/tree/translit

Basically the new code lets you do things like this:

I18n.backend.store_translations(:de, :support =>

{:transliterations => {“ü” => “ue”, “ö” => “oe”}})
assert_equal “Juergen Koehler”,
ActiveSupport::Inflector.transliterate(“Jürgen Köhler”)

Yaroslav, if you look in the tests you’ll see I also basically stole
your Russian transliterator verbatim. :slight_smile:

One of my main concerns has been keeping reasonably good performance
in a thread-safe manner. The new code memoizes the available
transliterators in a class variable, which I’m a little hesitant about
doing, but I think will be ok. If someone more experienced with
thread-safety issues could advise me on that I’d be grateful.

Regards,

Norman

Hi Norman,

great work!

As just said on rails-core, I’d like to see this functionality in I18n
if you were cool with that.

I wouldn’t be too concerned about performance for starters but rather
pluggability/extensibility. Figure out a good api and hook in a simple,
initial implementation that works for most cases. I would guess your’s
does. As long as people can plug in their own implementation and
improvements while Rails still ships with a basic but stable version
everybody will be happy :slight_smile:

Oh yeah, this would be really great.
I am just facing a problem where this feature comes very very very
handy :wink:
So great to see someone has already worked on it!

Max

On Wed, Apr 14, 2010 at 19:32, Sven F. [email protected]
wrote:

As just said on rails-core, I’d like to see this functionality in I18n if you were cool with that.

I wouldn’t be too concerned about performance for starters but rather pluggability/extensibility. Figure out a good api and hook in a simple, initial implementation that works for most cases. I would guess your’s does. As long as people can plug in their own implementation and improvements while Rails still ships with a basic but stable version everybody will be happy :slight_smile:

Sounds good. I’ll work first on a patch for i18n then, and once
there’s something decent in place I’ll patch it into Rails. That
should allow the changes to Rails to be very small.

I’ll be working on it in a fork on Github today.

On Wed, Apr 14, 2010 at 19:32, Sven F. [email protected]
wrote:

As just said on rails-core, I’d like to see this functionality in I18n if you were cool with that.

On more thing. An important assumption for my code is that the string
be valid, composed UTF-8. This isn’t a problem with Rails because I
can rely on ActiveSupport::Multibyte::Chars #normalize and #tidy_bytes
to prepare the string before passing it off to i18n for
transliteration.

However, people using i18n outside of Rails may have problems if their
strings contain UTF-8 with combining characters. Do you think it’s ok
to just document the issue and leave it up to users to resolve in
their own code?