Forum: Ruby remove non-ASCII characters in a string

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2006-01-22 19:42
(Received via mailing list)
Hi,

i needed a method to convert a piece of text to plain ascii and
replace all non-ascii chars with a placeholder.  I could not find
anything in the stdlib so I wrote one.

I'd love to hear your comments. (or pointers to existing libraries for
this task)

-Levin


#!/usr/bin/ruby

require 'iconv'

class String

  # removes all characters which are not part of ascii
  # and replaces them with +replacement+
  #
  # +replacement+ is supposed to be the same encoding as +source+
  #
  def asciify(replacement = "?", target = "ASCII", source = "UTF-8")
    intermediate = "UCS-4"
    pack_format = "N*"
    i = Iconv.new(intermediate, source)

    u16s = i.iconv(self)
    repl = i.iconv(replacement).unpack(pack_format)

    s = u16s.unpack(pack_format).collect { |codepoint|
      codepoint < 128 ? codepoint : repl
    }.flatten.pack(pack_format)

    return Iconv.new(target, intermediate).iconv(s)
  end
end

if __FILE__ == $0
  require 'test/unit'

  class TestAsciify < Test::Unit::TestCase
    def test_asciify
      assert_equal "Iñtërnâtiônàlizætiøn".asciify,
"I?t?rn?ti?n?liz?ti?n"
      assert_equal "Mötorhead".asciify("(removed)"), "M(removed)torhead"
    end
  end
end
6076c22b65b36f5d75c30bdcfb2fda85?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-01-22 23:22
(Received via mailing list)
I have a need for something like this as well. But I need to
replace the chars with something plain ascii besides a placeholder.
Any ideas how to do that?

	I ended up finding the escape codes for all the chars like "\322"
and friends so I could replace say curly quotes with standard quotes
and stuff like that.

	I will play with your code a bit and see if I can make it do what I
want. Thanks for sharing it though.

Cheers-
-Ezra



On Jan 22, 2006, at 10:41 AM, Levin Alexander wrote:

>
>   # +replacement+ is supposed to be the same encoding as +source+
>       codepoint < 128 ? codepoint : repl
>     def test_asciify
>       assert_equal "Iñtërnâtiônàlizætiøn".asciify, "I?t?rn?ti?n?liz?
> ti?n"
>       assert_equal "Mötorhead".asciify("(removed)"), "M(removed)
> torhead"
>     end
>   end
> end

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
http://yakimaherald.com
ezra@yakima-herald.com
blog: http://brainspl.at
Dedb38b3571b323b77bc9b221e940172?d=identicon&s=25 ruby talk (Guest)
on 2006-01-22 23:46
(Received via mailing list)
What is considered an ascii char? i used the
http://www.lookuptables.com/ chart from ! to ~ or 33 .. 127

class String
def remove_nonascii(replacement)
n=self.split("")
self.slice!(0..self.size)
n.each{|b|
 if b[0].to_i< 33 || b[0].to_i>127 then
	 self.concat(replacement)
	 else
	 self.concat(b)
 end
 }
 self.to_s
end
end
 require 'test/unit'

 class TestAsciify < Test::Unit::TestCase
   def test_asciify
     assert_equal "Iñtërnâtiônàlizætiøn".remove_nonascii("?"),
"I?t?rn?ti?n?liz?ti?n"
     assert_equal "Mötorhead".remove_nonascii("(removed)"),
"M(removed)torhead"
   end
 end
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2006-01-23 00:13
(Received via mailing list)
On 1/22/06, ruby talk <rubytalk@gmail.com> wrote:
> What is considered an ascii char? i used the
> http://www.lookuptables.com/ chart from ! to ~ or 33 .. 127

I use everything <127 because I want to preserve tabs and linebreaks

> def remove_nonascii(replacement)

This does not work if the source text is UTF-8 encoded.  On my machine:

  str = "ö"  #=> "\303\266"
  str.remove_nonascii #=> "??"

-Levin
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-01-24 17:31
(Received via mailing list)
ruby talk wrote:
> What is considered an ascii char? i used the
> http://www.lookuptables.com/ chart from ! to ~ or 33 .. 127

Your range excludes ASCII character 32, space.
" ".remove_nonascii("?") #=> "?"

You also likely want to include characters like tabs and newlines, which
are
in the 0-31 control range.

Levin's original version treats the original text as UTF-8. Is that part
of
the requirements?

My version might look like this:

NON_ASCII = /[\x80-\xff]/
"Iñtërnâtiônàlizætiøn".gsub(NON_ASCII, "?") #=> "I?t?rn?ti?n?liz?ti?n"


Cheers,
Dave
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-01-24 17:37
(Received via mailing list)
Dave Burt <dave@burt.id.au> wrote:

> My version might look like this:
>
> NON_ASCII = /[\x80-\xff]/
> "Iñtërnâtiônàlizætiøn".gsub(NON_ASCII, "?") #=> "I?t?rn?ti?n?liz?ti?n"

i'd like not to remove no-ascii chars but replace all accentuated chars
(in an UTF-8 string) by them non-accentuated counterpart :

è => e
ä => a
ç => c

[...]

what is the best way to do that in Ruby?
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-01-24 17:40
(Received via mailing list)
"Une bévue" wrote:
> è => e
> ä => a
> ç => c
>
> [...]
>
> what is the best way to do that in Ruby?

Try the code below, translated from
http://stuffofinterest.com/misc/utf8-about.html

There may be a potential problem matching over character boundaries, but
I
think UTF-8's unique starting bytes avoid the issue. So this should
work.
For long strings, it could be slow. If I wanted speed, I'd probably do
the
same thing in C and make it an extension.

Cheers,
Dave

class String
# Translate accented utf8 characters over to non-accented
def utf8_trans_unaccent
  tranmap = {
    "\xC3\x80" => "A", "\xC3\x81" => "A", "\xC3\x82" => "A", "\xC3\x83"
=>
"A",
    "\xC3\x84" => "A", "\xC3\x85" => "A", "\xC3\x86" => "AE","\xC3\x87"
=>
"C",
    "\xC3\x88" => "E", "\xC3\x89" => "E", "\xC3\x8A" => "E", "\xC3\x8B"
=>
"E",
    "\xC3\x8C" => "I", "\xC3\x8D" => "I", "\xC3\x8E" => "I", "\xC3\x8F"
=>
"I",
    "\xC3\x90" => "D", "\xC3\x91" => "N", "\xC3\x92" => "O", "\xC3\x93"
=>
"O",
    "\xC3\x94" => "O", "\xC3\x95" => "O", "\xC3\x96" => "O", "\xC3\x98"
=>
"O",
    "\xC3\x99" => "U", "\xC3\x9A" => "U", "\xC3\x9B" => "U", "\xC3\x9C"
=>
"U",
    "\xC3\x9D" => "Y", "\xC3\x9E" => "P", "\xC3\x9F" => "ss",
    "\xC3\xA0" => "a", "\xC3\xA1" => "a", "\xC3\xA2" => "a", "\xC3\xA3"
=>
"a",
    "\xC3\xA4" => "a", "\xC3\xA5" => "a", "\xC3\xA6" => "ae","\xC3\xA7"
=>
"c",
    "\xC3\xA8" => "e", "\xC3\xA9" => "e", "\xC3\xAA" => "e", "\xC3\xAB"
=>
"e",
    "\xC3\xAC" => "i", "\xC3\xAD" => "i", "\xC3\xAE" => "i", "\xC3\xAF"
=>
"i",
    "\xC3\xB0" => "o", "\xC3\xB1" => "n", "\xC3\xB2" => "o", "\xC3\xB3"
=>
"o",
    "\xC3\xB4" => "o", "\xC3\xB5" => "o", "\xC3\xB6" => "o", "\xC3\xB8"
=>
"o",
    "\xC3\xB9" => "u", "\xC3\xBA" => "u", "\xC3\xBB" => "u", "\xC3\xBC"
=>
"u",
    "\xC3\xBD" => "y", "\xC3\xBE" => "p", "\xC3\xBF" => "y",
    "\xC4\x80" => "A", "\xC4\x81" => "a", "\xC4\x82" => "A", "\xC4\x83"
=>
"a",
    "\xC4\x84" => "A", "\xC4\x85" => "a", "\xC4\x86" => "C", "\xC4\x87"
=>
"c",
    "\xC4\x88" => "C", "\xC4\x89" => "c", "\xC4\x8A" => "C", "\xC4\x8B"
=>
"c",
    "\xC4\x8C" => "C", "\xC4\x8D" => "c", "\xC4\x8E" => "D", "\xC4\x8F"
=>
"d",
    "\xC4\x90" => "D", "\xC4\x91" => "d", "\xC4\x92" => "E", "\xC4\x93"
=>
"e",
    "\xC4\x94" => "E", "\xC4\x95" => "e", "\xC4\x96" => "E", "\xC4\x97"
=>
"e",
    "\xC4\x98" => "E", "\xC4\x99" => "e", "\xC4\x9A" => "E", "\xC4\x9B"
=>
"e",
    "\xC4\x9C" => "G", "\xC4\x9D" => "g", "\xC4\x9E" => "G", "\xC4\x9F"
=>
"g",
    "\xC4\xA0" => "G", "\xC4\xA1" => "g", "\xC4\xA2" => "G", "\xC4\xA3"
=>
"g",
    "\xC4\xA4" => "H", "\xC4\xA5" => "h", "\xC4\xA6" => "H", "\xC4\xA7"
=>
"h",
    "\xC4\xA8" => "I", "\xC4\xA9" => "i", "\xC4\xAA" => "I", "\xC4\xAB"
=>
"i",
    "\xC4\xAC" => "I", "\xC4\xAD" => "i", "\xC4\xAE" => "I", "\xC4\xAF"
=>
"i",
    "\xC4\xB0" => "I", "\xC4\xB1" => "i", "\xC4\xB2" => "IJ","\xC4\xB3"
=>
"ij",
    "\xC4\xB4" => "J", "\xC4\xB5" => "j", "\xC4\xB6" => "K", "\xC4\xB7"
=>
"k",
    "\xC4\xB8" => "k", "\xC4\xB9" => "L", "\xC4\xBA" => "l", "\xC4\xBB"
=>
"L",
    "\xC4\xBC" => "l", "\xC4\xBD" => "L", "\xC4\xBE" => "l", "\xC4\xBF"
=>
"L",
    "\xC5\x80" => "l", "\xC5\x81" => "L", "\xC5\x82" => "l", "\xC5\x83"
=>
"N",
    "\xC5\x84" => "n", "\xC5\x85" => "N", "\xC5\x86" => "n", "\xC5\x87"
=>
"N",
    "\xC5\x88" => "n", "\xC5\x89" => "n", "\xC5\x8A" => "N", "\xC5\x8B"
=>
"n",
    "\xC5\x8C" => "O", "\xC5\x8D" => "o", "\xC5\x8E" => "O", "\xC5\x8F"
=>
"o",
    "\xC5\x90" => "O", "\xC5\x91" => "o", "\xC5\x92" => "CE","\xC5\x93"
=>
"ce",
    "\xC5\x94" => "R", "\xC5\x95" => "r", "\xC5\x96" => "R", "\xC5\x97"
=>
"r",
    "\xC5\x98" => "R", "\xC5\x99" => "r", "\xC5\x9A" => "S", "\xC5\x9B"
=>
"s",
    "\xC5\x9C" => "S", "\xC5\x9D" => "s", "\xC5\x9E" => "S", "\xC5\x9F"
=>
"s",
    "\xC5\xA0" => "S", "\xC5\xA1" => "s", "\xC5\xA2" => "T", "\xC5\xA3"
=>
"t",
    "\xC5\xA4" => "T", "\xC5\xA5" => "t", "\xC5\xA6" => "T", "\xC5\xA7"
=>
"t",
    "\xC5\xA8" => "U", "\xC5\xA9" => "u", "\xC5\xAA" => "U", "\xC5\xAB"
=>
"u",
    "\xC5\xAC" => "U", "\xC5\xAD" => "u", "\xC5\xAE" => "U", "\xC5\xAF"
=>
"u",
    "\xC5\xB0" => "U", "\xC5\xB1" => "u", "\xC5\xB2" => "U", "\xC5\xB3"
=>
"u",
    "\xC5\xB4" => "W", "\xC5\xB5" => "w", "\xC5\xB6" => "Y", "\xC5\xB7"
=>
"y",
    "\xC5\xB8" => "Y", "\xC5\xB9" => "Z", "\xC5\xBA" => "z", "\xC5\xBB"
=>
"Z",
    "\xC5\xBC" => "z", "\xC5\xBD" => "Z", "\xC5\xBE" => "z", "\xC6\x8F"
=>
"E",
    "\xC6\xA0" => "O", "\xC6\xA1" => "o", "\xC6\xAF" => "U", "\xC6\xB0"
=>
"u",
    "\xC7\x8D" => "A", "\xC7\x8E" => "a", "\xC7\x8F" => "I",
    "\xC7\x90" => "i", "\xC7\x91" => "O", "\xC7\x92" => "o", "\xC7\x93"
=>
"U",
    "\xC7\x94" => "u", "\xC7\x95" => "U", "\xC7\x96" => "u", "\xC7\x97"
=>
"U",
    "\xC7\x98" => "u", "\xC7\x99" => "U", "\xC7\x9A" => "u", "\xC7\x9B"
=>
"U",
    "\xC7\x9C" => "u",
    "\xC7\xBA" => "A", "\xC7\xBB" => "a", "\xC7\xBC" => "AE","\xC7\xBD"
=>
"ae",
    "\xC7\xBE" => "O", "\xC7\xBF" => "o",
    "\xC9\x99" => "e",

    "\xC2\x82" => ",",        # High code comma
    "\xC2\x84" => ",,",       # High code double comma
    "\xC2\x85" => "...",      # Tripple dot
    "\xC2\x88" => "^",        # High carat
    "\xC2\x91" => "\x27",     # Forward single quote
    "\xC2\x92" => "\x27",     # Reverse single quote
    "\xC2\x93" => "\x22",     # Forward double quote
    "\xC2\x94" => "\x22",     # Reverse double quote
    "\xC2\x96" => "-",        # High hyphen
    "\xC2\x97" => "--",       # Double hyphen
    "\xC2\xA6" => "|",        # Split vertical bar
    "\xC2\xAB" => "<<",       # Double less than
    "\xC2\xBB" => ">>",       # Double greater than
    "\xC2\xBC" => "1/4",      # one quarter
    "\xC2\xBD" => "1/2",      # one half
    "\xC2\xBE" => "3/4",      # three quarters

    "\xCA\xBF" => "\x27",     # c-single quote
    "\xCC\xA8" => "",         # modifier - under curve
    "\xCC\xB1" => ""          # modifier - under line
  }

  tranmap.inject(self) do |str, (utf8, asc)|
    p [utf8, asc]
    str.gsub(utf8, asc)
  end
end
end

"Iñtërnâtiônàlizætiøn".utf8_trans_unaccent #=> "Internationalizaetion"
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-01-24 17:40
(Received via mailing list)
Dave Burt <dave@burt.id.au> wrote:

> ry the code below, translated from
> http://stuffofinterest.com/misc/utf8-about.html
>
> There may be a potential problem matching over character boundaries, but I
> think UTF-8's unique starting bytes avoid the issue. So this should work.
> For long strings, it could be slow. If I wanted speed, I'd probably do the
> same thing in C and make it an extension.

thanks a lot this works great even with ligatures, i don't need speed
because i'll use that only for file names...
D57f4a4788599a38494865a121f16bbe?d=identicon&s=25 dseverin (Guest)
on 2006-01-24 17:43
(Received via mailing list)
Ç?Ä?á¸?á»?ñťÅ?, you say? What about these (incomplete list, and w/o
ligatures) :))))

a, 69,
AaªÃ?ÁÃ?Ã?Ã?Ã?àáâãäåÄ?āÄ?Ä?Ä?Ä?ǍÇ?Ç?Ç?ǠǡǺǻÈ?ȁÈ?È?Ȧȧᴬáµ?á¸?ḁẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặâ?Ã?â?¶â?ï¼¡ï½
b, 15, Bbá´®áµ?á¸?á¸?á¸?á¸?á¸?á¸?â?¬â?·â??ï¼¢ï½?
c, 23, CcÃ?çÄ?Ä?Ä?Ä?Ä?Ä?Ä?čá¶?á¸?á¸?â??â?­â?­â?½â?¸â??ï¼£ï½?
d, 29,
DdÐðÄ?ďĐÄ?á´°áµ?á¶?á¸?á¸?á¸?ḍá¸?ḏḐá¸?á¸?á¸?â??â??â?®â?¾â?¹â??Dï½?
e, 62,
EeÃ?Ã?Ã?Ã?èéêëÄ?Ä?Ä?Ä?Ä?Ä?Ä?Ä?Ä?Ä?È?È?È?È?Ȩȩᴱáµ?á¸?á¸?á¸?á¸?á¸?á¸?á¸?á¸?á¸?ḝẸẹẺẻẼẽẾếá»?ềá»?á»?á»?á»?á»?á»?â??â?¯â?°â??â?ºâ??ï¼¥ï½?
f, 10, Ffᶠá¸?á¸?â?±â?»â??Fï½?
g, 23, GgÄ?ĝÄ?Ä?ĠġĢģǦǧǴǵᴳᵍḠḡâ??â?¼â??Gï½?
h, 30,
HhĤĥĦħÈ?È?ʰᴴḢḣḤḥḦḧḨḩḪḫáº?â??â??â?â??â?â?½â??Hï½?
i, 46,
IiÃ?ÍÃ?ÏìíîïĨĩĪīĬĭĮįİǏǐÈ?È?È?È?ᴵᵢḬḭḮḯá»?á»?á»?á»?ⁱâ?â??â?¹â??â? â?°â?¾â??Iï½?
j, 12, JjĴĵǰʲᴶâ??â?¿â??Jï½?
k, 19, KkĶķǨǩᴷᵏḰḱḲḳḴḵKâ??â??Kï½?
l, 28,
LlĹĺĻļĽľŁÅ?ˡᴸḶḷḸḹḺḻḼḽâ??â??â?¬â?¼â?â??Lï½?
m, 17, MmᴹᵐḾḿá¹?ṁá¹?á¹?â?³â?¯â?¿â??â??Mm
n, 27,
NnÃ?ñÅ?Å?Å?Å?Å?Å?Ǹǹᴺá¹?á¹?á¹?á¹?á¹?á¹?á¹?á¹?ⁿâ??â??â?ï¼®ï½?
o, 83,
OoºÃ?Ã?Ã?Ã?Ã?Ã?òóôõöøÅ?ōÅ?ŏŐÅ?Æ Æ¡Ç?Ç?ǪǫǬǭǾǿÈ?ȍÈ?ȏȪȫȬȭȮȯȰȱᴼáµ?á¹?ṍá¹?ṏṐá¹?á¹?á¹?á»?ọá»?ỏỐá»?á»?á»?á»?á»?á»?á»?á»?á»?á»?á»?á»?ờá»?á»?ỠỡỢợâ??â?´â??â??Oo
p, 13, Ppá´¾áµ?á¹?á¹?á¹?á¹?â??â??â??Pp
q, 7, Qqâ??â??â? ï¼±ï½?
r, 30,
RrÅ?Å?Å?Å?Å?Å?ȐÈ?È?È?ʳᴿᵣá¹?á¹?á¹?á¹?á¹?ṝá¹?á¹?â??â??â?â??â?¡ï¼²ï½?
s, 29,
SsÅ?Å?Å?ŝÅ?Å?Å Å¡Å¿È?È?ˢṠṡṢṣṤṥṦṧṨṩáº?â??â?¢ï¼³ï½?
t, 23, TtŢţŤťÈ?È?áµ?áµ?ṪṫṬṭṮṯṰṱáº?â??â?£ï¼´ï½?
u, 69,
UuÃ?Ã?Ã?Ã?ùúûüŨũŪūŬŭŮůŰűŲųƯưÇ?Ç?Ç?Ç?Ç?Ç?Ç?Ç?Ç?Ç?È?È?È?È?ᵁáµ?ᵤṲṳṴṵṶṷṸṹṺṻỤụỦủỨứỪừỬửỮữỰựâ??â?¤ï¼µï½?
v, 14, Vváµ?ᵥṼṽṾṿâ?¤â?´â??â?¥ï¼¶ï½?
w, 21, WwŴŵʷáµ?áº?ẁáº?áº?áº?áº?áº?áº?áº?áº?áº?â??â?¦ï¼·ï½?
x, 14, XxË£áº?áº?áº?ẍâ??â?©â?¹â?â?§ï¼¸ï½?
y, 26,
YyÝýÿŶŷŸȲȳʸáº?ẏáº?ỲỳỴỵỶỷỸỹâ??â?¨ï¼¹ï½?
z, 21, ZzŹźŻżŽžᶻẐ���������Z�
2cf6d8e639314abd751f83a72e9a2ac5?d=identicon&s=25 Martin DeMello (Guest)
on 2006-01-24 17:46
(Received via mailing list)
Dave Burt <dave@burt.id.au> wrote:
>
>   tranmap.inject(self) do |str, (utf8, asc)|
>     p [utf8, asc]
>     str.gsub(utf8, asc)
>   end
> end
> end
>
> "Iñtërnâtiônàlizætiøn".utf8_trans_unaccent #=> "Internationalizaetion"

# one time preprocessing
INTL, ASC = "", ""
tranmap.each {|k,v|
  INTL << k
  ASC << v
}

# quicker than repeated gsubs:
str.tr(INTL, ASC)

martin
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-01-24 17:49
(Received via mailing list)
Martin DeMello wrote:
>
> # one time preprocessing
> INTL, ASC = "", ""
> tranmap.each {|k,v|
>  INTL << k
>  ASC << v
> }
>
> # quicker than repeated gsubs:
> str.tr(INTL, ASC)

Except that won't work, because tr only matches bytes, not multi-byte
characters.

(It might work after applying one of the Unicode string extensions that
have
been floating around recently. But not in standard Ruby.)

Cheers,
Dave
2cf6d8e639314abd751f83a72e9a2ac5?d=identicon&s=25 Martin DeMello (Guest)
on 2006-01-24 17:55
(Received via mailing list)
Dave Burt <dave@burt.id.au> wrote:
>
> Except that won't work, because tr only matches bytes, not multi-byte
> characters.
>
> (It might work after applying one of the Unicode string extensions that have
> been floating around recently. But not in standard Ruby.)

Oh - didn't know that! Pretty sad. Thanks for the correction.

martin
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2006-01-24 18:38
(Received via mailing list)
On 1/22/06, Ezra Zygmuntowicz <ezmobius@gmail.com> wrote:
>         I have a need for something like this as well. But I need to
> replace the chars with something plain ascii besides a placeholder.
> Any ideas how to do that?

What I have now looks like this:

  # basic usage
  "schön".asciify #=> "sch?n"
  # with mapping
  map = Asciify::Mapping.new(:default)
  "â??foo"".asciify(map)   #=> '"foo"'
  Asciify.new(map).convert("schön")  #=> "schoen"
  Asciify.new(Asciify::HTMLEntities.new).convert("schön") #=>
"sch&#246;n"

Mapping.new(:default) reads the mappings from a YAML file, you can use
Mapping.new("file.yaml") to load your own mappings or supply a Hash or
lambda to Asciify.new.

I have put it on rubyforge.  This is my very first piece of released
code I'd very much like to hear your comments and criticism.

I'll try to work in the posted mappings.

Regards,
Levin
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-01-24 19:39
(Received via mailing list)
pere.noel@laponie.com.invalid (Une bévue) writes:

> è => e
> ä => a
> ç => c
>
> [...]
>
> what is the best way to do that in Ruby?

How about this:

require 'iconv'
puts Iconv.open("ASCII//TRANSLIT",
"ISO-8859-1").iconv("Iñtërnâtiônàlizætiøn")
#=> I~nt"ern^ati^on`alizaetion
This topic is locked and can not be replied to.