Issue #7154 has been reported by t0d0r (Todor Dragnev). ---------------------------------------- Bug #7154: For whatever reason \s doesn't match \u00a0. https://bugs.ruby-lang.org/issues/7154 Author: t0d0r (Todor Dragnev) Status: Open Priority: Normal Assignee: Category: core Target version: ruby -v: 1.9.3p286 The problem is already explained here: http://stackoverflow.com/questions/2588942/convert... I just hit it today.
on 2012-10-14 01:38
on 2012-10-15 01:52
My understanding is that in Ruby, all the pre-Unicode escapes, and in particular "\s", still refer only to characters in the ASCII range. My understanding is that this was done in this way for backwards compatibility, and on purpose. This can be explained as follows: Maybe somebody wrote a script doing some processing where they wanted to match ASCII 'space' characters. They used \s. If Ruby would change \s to suddenly match way more than before, the meaning of that program would change. Maybe it would change just in the right way. But maybe it would change in an unintended way. So the decision was to not second-guess the programmer. As a result, this does not behave the same way as what's suggested in Unicode TR #18. But please note that UTR #18 doesn't *require* \s to be treated as Unicode whitespace, it just *recommends* to do so (see http://www.unicode.org/reports/tr18/#Compatibility...). If you want to match against Unicode whitespace, what you should do is the following: "\u00a0" =~ /\p{Whitespace}/u Regards, Martin.
on 2012-10-15 02:00
Issue #7154 has been updated by duerst (Martin Dürst). Status changed from Open to Closed My understanding is that this is a feature. See above for explanation. I hope somebody can provide the feedback to http://stackoverflow.com/questions/2588942/convert.... ---------------------------------------- Bug #7154: For whatever reason \s doesn't match \u00a0. https://bugs.ruby-lang.org/issues/7154#change-30686 Author: t0d0r (Todor Dragnev) Status: Closed Priority: Normal Assignee: Category: core Target version: ruby -v: 1.9.3p286 The problem is already explained here: http://stackoverflow.com/questions/2588942/convert... I just hit it today.
on 2012-10-15 02:04
Just forgot to mention that the pickaxe book, for "\s", says "For Unicode, add Line_Separator codepoints.". This is wrong because even LINE SEPARATOR itself, \u2028, doesn't match \s. It would also be wrong in that the result would be to match ASCII whitespace and Unicode line separators, whereas other Unicode whitespace would be ignored. Regards, Martin.
on 2012-10-16 11:41
Issue #7154 has been updated by t0d0r (Todor Dragnev). duerst (Martin Dürst) wrote: > My understanding is that this is a feature. See previous post for explanation. I hope somebody can provide the feedback to http://stackoverflow.com/questions/2588942/convert.... My understanding is that: * We are surrounded by Unicode text, most of the Internet pages and documents are UTF8. If the language don't adapt of the surrounding environment it will be replaced by new one, which provides better tools for the real situation. Not all people of the world use english alphabet as a primary language... * We all are humans, reading "white space" for me means white space in the text in that case with \u00a0 I opened hex editor to see whats wrong, I like the simplicity of Ruby and to code less. All good and popular programming languages are oriented to be in help for humans, complexity kill the popularity - did I know someone near you to write Assembler these days? * "String".downcase produce "string", "Стринг".downcase must produce "стринг", but it's not. Ok thats correct for 1.8.x - we don't have multibyte support. But why in 1.9.x I need to use specific libraries to receive a proper results. UnicodeUtils.downcase("Стринг") works fine... Thanks Stefan Lang. Maybe Ruby wants to become next PHP with 10 methods doing one think? http://www.tnx.nl/php.html. For me(and maybe others) downcase/upcase/\s and similar methods in 1.9.x are useless... Why we have multibyte support without multi language awareness? This is odd from me as a human... * Firefox has a lots of features and now is going to die, because they did't complain with users warnings about memory management... :) ---------------------------------------- Bug #7154: For whatever reason \s doesn't match \u00a0. https://bugs.ruby-lang.org/issues/7154#change-30840 Author: t0d0r (Todor Dragnev) Status: Closed Priority: Normal Assignee: Category: core Target version: ruby -v: 1.9.3p286 The problem is already explained here: http://stackoverflow.com/questions/2588942/convert... I just hit it today.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.