[ANN] DiacriticsFu 1.0.1 released

Hi,

DiacriticsFu is a gem that relies on ActiveSupport to remove accents
and other diacritics from a string. I use it when I need to generate
urls on non-english-speaking CMS or blogs.

Release 1.0.1 brings support for Rails 2.2+ (patch courtesy of Nicolas
Fouché).

=== installation ===

gem sources -a http://gems.github.com

sudo gem install thbar-diacritics_fu

(alternatively, clone from
GitHub - thbar/diacritics_fu: Tiny Ruby library to remove accents and other diacritics from a string (relies on ActiveSupport).)

=== compatibility ===

DiacriticsFu has been tested against the following versions of
ActiveSupport: 2.2.2, 2.1.2, 2.1.1, 2.1.0, 2.0.5, 2.0.2, 2.0.1, 1.4.4,
1.4.2.

=== examples ===

DiacriticsFu::escape(“éphémère”)
=> “ephemere”

DiacriticsFu::escape(“räksmörgås”)
=> “raksmorgas”

=== feedback ? ===

In case you meet any issue, drop a mail ([email protected]).

cheers,

Thibaut Barrère

http://blog.logeek.fr
http://evolvingworker.com

I was very excited when I saw this, but I can’t get it to work on my
system – though I admit to having no knowledge of multibyte
characters and I would guess my problem is either an environment
issue, or a change that is happening in Rails 2.3, but perhaps you
could point me in the right direction.

here’s a sample of my output…

ruby script/console
Loading development environment (Rails 2.3.0)


  * config.breakpoint_server has been deprecated and has no

effect. *


DiacriticsFu::escape(‘Réne’)
=> “Réne”

ActiveSupport::VERSION::STRING
=> “2.3.0”

str = ‘Réne’
=> “Réne”

a = ActiveSupport::Multibyte::Chars.new(str)
=> #<ActiveSupport::Multibyte::Chars:0x7f4165ec47c8
@wrapped_string=“Réne”>

a.normalize(:d)
=> #<ActiveSupport::Multibyte::Chars:0x7f4165ec17d0
@wrapped_string=“Réne”>

b = a.normalize(:d)
=> #<ActiveSupport::Multibyte::Chars:0x7f4165ebced8
@wrapped_string=“Réne”>

c = b.split(//u)
=> [#<ActiveSupport::Multibyte::Chars:0x7f4165eb91c0
@wrapped_string=“R”>, #<ActiveSupport::Multibyte::Chars:0x7f4165eb9148
@wrapped_string=“e”>, #<ActiveSupport::Multibyte::Chars:0x7f4165eb9080
@wrapped_string=“́”>, #<ActiveSupport::Multibyte::Chars:0x7f4165eb8fb8
@wrapped_string=“n”>, #<ActiveSupport::Multibyte::Chars:0x7f4165eb8ef0
@wrapped_string=“e”>]

c.map{|ch|ch.length}
=> [1, 1, 1, 1, 1]

So, on my system, what we expect would be a 4 character array with the
second of length > 1,
it’s a 5 character array, each of length 1.

and just for kicks:

d = a.split(//u)
=> [#<ActiveSupport::Multibyte::Chars:0x7f4165ead618
@wrapped_string=“R”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ead5a0
@wrapped_string=“é”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ead4d8
@wrapped_string=“n”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ead410
@wrapped_string=“e”>]

d.map{|ch|ch.length}
=> [1, 1, 1, 1]

e = b.split(//u)
=> [#<ActiveSupport::Multibyte::Chars:0x7f4165ea1458
@wrapped_string=“R”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ea13e0
@wrapped_string=“e”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ea1318
@wrapped_string=“́”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ea1250
@wrapped_string=“n”>, #<ActiveSupport::Multibyte::Chars:0x7f4165ea1188
@wrapped_string=“e”>]

e.map{|ch|ch.length}
=> [1, 1, 1, 1, 1]

If you see something that could help me out, I’d appreciate it. Like I
said, this could come in very handy for me.

Thanks,

John Devine
[email protected]

Hello John,

I was very excited when I saw this, but I can’t get it to work on my
system – though I admit to having no knowledge of multibyte
characters and I would guess my problem is either an environment
issue, or a change that is happening in Rails 2.3, but perhaps you
could point me in the right direction.

thanks for your feedback - I’m happy that the library could be useful
to some of us :slight_smile:

My first reaction was: has Rails edge introduced a refactoring that
would make this fail ? So I cloned edge from GitHub and added this to
the gem Rakefile:

task :escape_with_rails_edge do

grab edge, cloned from github

$LOAD_PATH << File.dirname(FILE) + “/gems/rails/activesupport/lib”
require ‘active_support’
require ‘active_support/version’
puts ActiveSupport::VERSION::STRING
require File.dirname(FILE) + ‘/lib/diacritics_fu’
puts DiacriticsFu::escape(“Réne”)
end

It seems to work, so no issue with Rails:

Macintosh:diacritics_fu thbar$ rake escape_with_rails_edge
(in /Users/thbar/git/diacritics_fu)
2.3.0
Rene

At this point I think it’s some issue with code encoding probably.

(check, check, wait…)

Hum it seems that I have some issue. In my spec, $KCODE is set to nil.
If I set it to “UTF8” like Rails does by default, the spec doesn’t
pass (you keep Réne instead of Rene).

I guess I’ll have to investigate more to see why it works in my other
setup, and not in this one. This is probably some environmental issue
like you said.

I’ll keep you posted - if you find anything on your side, please contact
me!

Thanks for your feedback,

– Thibaut

I investigated a bit more:

  • with a blank Rails 2.0.5 app, works as expected
  • with a blank Rails 2.2.2 app, doesn’t work

So obviously the >= Rails 2.2.2 fix (see new_escaper.rb) does not work
properly.

I’ll keep you posted.

cheers

– Thibaut

Hey there,

I just released DiacriticsFu 1.0.2 that should work with Rails >= 2.2
and $KCODE “UTF8” (the default Rails setup).

It’s still a hack but I hope it will work for you :slight_smile:

cheers

Thibaut

http://blog.logeek.fr
http://evolvingworker.com