Forum: Ruby why does . match non-ascii chars?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-02-23 17:32
str = "abcdéf "

result = str.gsub(/./n) do |match|
  puts "%%%02X" % match[0]
end
puts


--output:--
%61
%62
%63
%64
%C3
%A9
%66


Doesn't the 'n' option say to match ascii?   For what it's worth, I get
the same result without the 'n' option.
86e33dee4a89a8879a26487051c216a8?d=identicon&s=25 Michael Fellinger (Guest)
on 2009-02-24 02:16
(Received via mailing list)
On Tue, Feb 24, 2009 at 1:34 AM, 7stud -- <bbxx789_05ss@yahoo.com>
wrote:
> %62
> %63
> %64
> %C3
> %A9
> %66
>
>
> Doesn't the 'n' option say to match ascii?   For what it's worth, I get
> the same result without the 'n' option.

The default switch of a regex is actually 'n' already, that only
changes if you set $KCODE before.
It has little influence on what is matched when it comes to '.', but
it influences how the matched bytes will be grouped to resemble
characters.

sigma ~ % ruby -e 'p "abcdéf ".scan(/./)'
["a", "b", "c", "d", "\303", "\251", "f", " "]

sigma ~ % ruby -e 'p "abcdéf ".scan(/./u)'
["a", "b", "c", "d", "\303\251", "f", " "]

sigma ~ % ruby -Kue 'p "abcdéf ".scan(/./u)'
["a", "b", "c", "d", "é", "f", " "]

sigma ~ % ruby19 -e 'p "abcdéf ".scan(/./)'
["a", "b", "c", "d", "é", "f", " "]

Please see some excellent articles about this topic from James Edward
Gray II:

http://blog.grayproductions.net/articles/bytes_and...
http://blog.grayproductions.net/categories/charact...

^ manveru
This topic is locked and can not be replied to.