Issue #7501 has been reported by eltomito (Tomas Partl). ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501 Author: eltomito (Tomas Partl) Status: Open Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters puts match.to_s match = /\w+/.match( "üäö" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz áéíóůúýžščřďťň üäö
on 2012-12-03 10:22
on 2012-12-03 13:27
Issue #7501 has been updated by charliesome (Charlie Somerville). /[[:alpha:]]+/ should behave as you expect ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501#change-34360 Author: eltomito (Tomas Partl) Status: Open Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters puts match.to_s match = /\w+/.match( "üäö" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz áéíóůúýžščřďťň üäö
[ruby-trunk - Bug #7501][Rejected] \w in a regular expression doesn't match international characters
on 2012-12-03 19:44
Issue #7501 has been updated by shyouhei (Shyouhei Urabe). Status changed from Open to Rejected If I remember correctly this is an intentional design. Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time. It is hard for us to follow that. ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501#change-34380 Author: eltomito (Tomas Partl) Status: Rejected Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters puts match.to_s match = /\w+/.match( "üäö" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz áéíóůúýžščřďťň üäö
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.