Replacing diacritics by simple character

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

Le 25 septembre à 18:25, Une Bévue a écrit :

(Hello again… :slight_smile: )

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

IConv can do that for you :

require “iconv”
=> true

i = Iconv.new(“ASCII//TRANSLIT”, “ISO-8859-15”)
=> #Iconv:0x84d4448

i.iconv(“aéouï Æ”)
=> “a’eou"i AE”

i.iconv(“aéouï Æ”).gsub(/[^a-zA-Z0-9 ]/, ‘’)
=> “aeoui AE”

Fred


I’ve found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker
--------------------------------------------------------------------------------------^
???
brings you said instrument. Suddenly, no more paper jams.
(Kai Henningsen in the SDM)

:smiley:

F. Senault [email protected] wrote:

IConv can do that for you :

require “iconv”
=> true
i = Iconv.new(“ASCII//TRANSLIT”, “ISO-8859-15”)
=> #Iconv:0x84d4448
i.iconv(“aéouï Æ”)
=> “a’eou"i AE”
i.iconv(“aéouï Æ”).gsub(/[^a-zA-Z0-9 ]/, ‘’)
=> “aeoui AE”

Fine thanks a lot Fred à c’t’heure :wink:

Have a good wine celler :wink:

ça marche même avec de l’UTF-8

works also with UTF-8

Le 25 septembre à 20:12, Michal S. a écrit :


I’ve found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker
--------------------------------------------------------------------------------------^
???

It’s intentional. Cow orker was probably a typo in the olden times, but
has entered the mainstream since then. Just ask google : “Results 1 -
10 of about 37,200 for “cow orker”. (0.19 seconds)” :slight_smile:

Fred

On Sep 25, 2007, at 18:55, F. Senault wrote:

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

IConv can do that for you :

An alternative approach is something like Sean M. Burke’s
Text::Unidecode:

http://interglacial.com/~sburke/tpj/as_html/tpj22.html

Here is an example of an implementation of Unidecode in Lua [1]:

local Unidecode = require( ‘Unidecode’ )

print( Unidecode( ‘Москва́’ ) )
print( Unidecode( ‘北京’ ) )
print( Unidecode( ‘Ἀθηνᾶ’ ) )
print( Unidecode( ‘서울’ ) )
print( Unidecode( ‘東京’ ) )
print( Unidecode( ‘京都市’ ) )
print( Unidecode( ‘नेपाल’ ) )
print( Unidecode( ‘תֵּל־אָבִיב-יָפוֹ’ ) )
print( Unidecode( ‘تَلْ أَبِيبْ يَافَا’ ) )
print( Unidecode( ‘تهران’ ) )
print( Unidecode( ‘Géometrie Différentielle’ ) )

Moskva
beijing
Athena
seoul
dongjing
jingdushi
nepaal
te’labiyb-yapvo
tal 'abiyb yaafaa
thran
Geometrie Differentielle

Cheers,

PA.

[1] http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua

F. Senault wrote:

IConv can do that for you :

require “iconv”
=> true

i = Iconv.new(“ASCII//TRANSLIT”, “ISO-8859-15”)
=> #Iconv:0x84d4448

i.iconv(“aéouï Æ”)
=> “a’eou"i AE”

i.iconv(“aéouï Æ”).gsub(/[^a-zA-Z0-9 ]/, ‘’)
=> “aeoui AE”

That doesn’t work on all platforms. For me:

require “iconv”
=> true

i = Iconv.new(“ASCII//TRANSLIT”, “UTF-8”)
=> #Iconv:0xb7cf28e0

i.iconv(“aéouï Æ”)
=> “a?ou? AE”

:frowning:

How do i get off this mailing list ? THANKS!!!

Daniel DeLorme [email protected] wrote:

:frowning:
Are u sure about the encoding of “aéouï Æ” ?

because i did it with UTF-8, it works :

– the script ----------------------------------------------------------
#! /usr/bin/env ruby

require “iconv”

i = Iconv.new(“ASCII//TRANSLIT”, “UTF-8”)

p i.iconv(“aéouï Æ”)

=> “a’eou"i AE”

p i.iconv(“aéouï Æ”).gsub(/[^a-zA-Z0-9 ]/, ‘’)

=> “aeoui AE”

p i.iconv(“Être ou ne pas être, c’est la question. aéouï Æ, wie heiß du
?”).gsub(/[^a-zA-Z0-9’ ]/, ‘’).gsub(/[’ ]/, '').gsub(/(.*)$/, ‘\1’)

=> “Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du”

p i.iconv(“Être ou ne pas être, c’est la question. aéouï Æ, wie heiß
du?”).gsub(/[^a-zA-Z0-9’ ]/, ‘’).gsub(/[’ ]/, '').gsub(/(.*)$/, ‘\1’)

=> “Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du”

Une Bévue wrote:

:frowning:

Are u sure about the encoding of “aéouï Æ” ?

yep.

str = “aéouï Æ”
=> “a\303\251ou\303\257 \303\206” #(that’s utf8 allright)

i.iconv(str)
=> “a?ou? AE”

but like I said, translit doesn’t work the same on all platforms (I’m on
ubuntu btw)

Daniel

Daniel DeLorme [email protected] wrote:

but like I said, translit doesn’t work the same on all platforms (I’m on
ubuntu btw)

i’m running Mac OS X 10.4.10…