Hi, I got a problem try to replace accentuated characters like:
irb
irb(main):001:0>
irb(main):002:0* name = “Fênix”
=> “F\303\252nix”
irb(main):003:0> name.gsub(/[éê]/,‘e’)
=> “Feenix”
irb(main):004:0> name.gsub(/é|ê/,‘e’)
=> “Fenix”
What’s the difference between /[éê]/ and /é|ê/ ?
ps: ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]
What’s the difference between /[éê]/ and /é|ê/ ?
In that context there shouldn’t be any difference
If the source is in utf-8, then ruby 1.8 interpretes [éê] as a choice
of 4 bytes: [195, 169, 195, 170]
Fênix is seen as:
[70, 195, 170, 110, 105, 120]
195 & 170 get replaced with “e”, hence Feenix.
On Fri, Mar 6, 2009 at 6:02 PM, Jonatas P. [email protected]
wrote:
Hi, I got a problem try to replace accentuated characters like:
irb(main):002:0* name = “Fênix”
=> “F\303\252nix”
irb(main):003:0> name.gsub(/[éê]/,‘e’)
=> “Feenix”
irb(main):004:0> name.gsub(/é|ê/,‘e’)
=> “Fenix”
Looks to me like an encoding problem. What source encoding are you
working in?
If you set $KCODE = ‘UTF-8’ or append /u to the regex literals does it
resolve the inconsistency?
What’s the difference between /[éê]/ and /é|ê/ ?
In that context there shouldn’t be any difference. The union, |, can
be used for patterns longer than a single character, but the specific
patterns above look equivalent to me. But if the encoding isn’t set
appropriately all bets are off!
ps: ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]
ps: the unicode support has apparently been much improved in 1.9.
Cheers,
lasitha
If you set $KCODE = ‘UTF-8’ or append /u to the regex literals does it
resolve the inconsistency?
WORKS! setting $KCODE or using /u
interesting!!!
Thanks VERY MUCH!