Convert accentend chars to their base

is there a good way to convert “special” accented chars to their base
chars?
as an example i want “àéìòù” => “aeiou”
i’m using several gsub now
“àèìòù”.gsub(“à”,“a”).gsub(“è”,“e”).gsub(“ì”,“i”)…
it works but i wonder if there is something better than this.

Well something like

char_from = “à éìòù”
char_to = “aeiou”

x = “à éìòù”.gsub(char_from, char_to)
puts x

would at least make the code more maintainable

your code convert only that sequence of character. what i need is to
convert those accented char in every word.
so “città” => “citta”, “caffè” => “caffe” and so on
maybe some regexp?

On 10 Mag, 12:50, Peter H. [email protected]

Mmm

Just noticed another problem

char_from = “à éìòù”
char_to = “aeiou”

puts char_from.size => 10
puts char_to.size => 5

At least on my Mac. The problem here is encoding.

Looks trickier than I first thought, would be a cinch if this was
unicode
and we were using Java :slight_smile: Just decompose the unicode character and drop
the
accent characters.

Ignore everything I have said and lets hope someone who knows about this
can
suggest a solution, I am intrigued by this.

I’m having this very same problem when String.upcase() is not
uppercasing accentuated characters.
It seems that the problem is the encoding again.

regards

2010/5/10 Peter H. [email protected]:

suggest a solution, I am intrigued by this.


You received this message because you are subscribed to the Google G.
“Ruby on Rails: Talk” group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.

Oliver Hernàndez Valls

http://codit.wikidot.com
http://wiki.tramuntanal.cat

eugenio wrote:

your code convert only that sequence of character. what i need is to
convert those accented char in every word.
so “citt�” => “citta”, “caff�” => “caffe” and so on
maybe some regexp?

You’ll probably have to convert your text to normal form D or KD, then
filter out combining marks.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]