#coding: utf-8
str2 = “asdfМикимаує
p str2.encoding #Encoding:UTF-8
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,’’) #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I have installed latest ruby package from http://rubyinstaller.org/.
Here is my output of ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]
#coding: utf-8
str2 = “asdfМикимаує
p str2.encoding #Encoding:UTF-8
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,’’) #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I have installed latest ruby package from http://rubyinstaller.org/.
Here is my output of ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32]
#coding: utf-8
str2 = “asdfМикимаує
p str2.encoding #Encoding:UTF-8
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,’’) #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I think that \w (and similar shortcuts) is supposed to match ascii
characters only… thus it’s equivalent to [a-zA-Z].
Isn’t there some kind of unicode character class you can use?
Actually “asdfМикимаує.gsub!(/\w/u,’’) returns “” on linux so the
problem is from the windows package.
you can use “asdfМикимаує.gsub!(/\p{L}/,’’) to remove letters thought
#coding: utf-8
str2 = “asdfМикимаує
p str2.encoding #Encoding:UTF-8
p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
str2.gsub!(/\w/u,’’) #removes only latin characters
puts str2
The question is why /\w/ ignore cyrillic characters?
I think that \w (and similar shortcuts) is supposed to match ascii
characters only… thus it’s equivalent to [a-zA-Z].
Isn’t there some kind of unicode character class you can use?