Why does . match non-ascii chars?

7stud · February 23, 2009, 5:32pm

str = "abcdÃ©f "

result = str.gsub(/./n) do |match|
puts “%%%02X” % match[0]
end
puts

–output:–
%61
%62
%63
%64
%C3
%A9
%66

Doesn’t the ‘n’ option say to match ascii? For what it’s worth, I get
the same result without the ‘n’ option.

7stud · February 24, 2009, 2:16am

On Tue, Feb 24, 2009 at 1:34 AM, 7stud – [email protected]
wrote:

%62
%63
%64
%C3
%A9
%66

Doesn’t the ‘n’ option say to match ascii? For what it’s worth, I get
the same result without the ‘n’ option.

The default switch of a regex is actually ‘n’ already, that only
changes if you set $KCODE before.
It has little influence on what is matched when it comes to ‘.’, but
it influences how the matched bytes will be grouped to resemble
characters.

sigma ~ % ruby -e ‘p "abcdÃ©f ".scan(/./)’
[“a”, “b”, “c”, “d”, “\303”, “\251”, “f”, " "]

Please see some excellent articles about this topic from James Edward
Gray II:

http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18
http://blog.grayproductions.net/categories/character_encodings

^ manveru