Forum: Ruby-core Regex matching errors when using \W character class and /i option

Posted by Matthew Kerwin (mattyk)
on 2012-12-19 00:26
(Received via mailing list)
Issue #4044 has been updated by phluid61 (Matthew Kerwin).


ben_h (Ben Hoskings) wrote:
> But, I'm not sure how [^\W] should treat these characters:
> 0x00DF (Latin small letter sharp s)
> 0x017F (Latin small letter long s)
> 0x212A (Kelvin sign)

Can you just fall back on the Unicode categories?  If we define "word 
characters" as Letters and Numbers, U+212A is {Lu} and thus a word 
character.  Similary U+017F is {Ll}.

Seems a bit weird in the case of Kelvin (also the Angstrom Sign U+212B = 
{Lu}) but at least Unicode is a fixed and universally accessible 
standard.
----------------------------------------
Bug #4044: Regex matching errors when using \W character class and /i 
option
https://bugs.ruby-lang.org/issues/4044#change-34836

Author: ben_h (Ben Hoskings)
Status: Feedback
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: core
Target version: 1.9.2
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]


=begin
 Hi all,

 Josh Bassett and I just discovered an issue with regex matches on 
ruby-1.9.2p0. (We reduced it while we were hacking on gemcutter.)

 The case-insensitive (/i) option together with the non-word character 
class (\W) match inconsistently against the alphabet. Specifically the 
regex doesn't match properly against the letters 'k' and 's'.

 The following expression demonstrates the problem in irb:

     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/i] ].inspect }

 As a reference, the following two expressions are working properly:

     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/] ].inspect }
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[\w]/i] ].inspect }

 Cheers
 Ben Hoskings & Josh Bassett
=end
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.