Is this a bug of String#count?

lirh · August 19, 2010, 5:45pm

$ ruby -v
ruby 1.9.1p429 (2010-07-02 revision 28523) [i386-darwin9]
$ irb --simple-prompt

s = “a\u4e00” # \u4e00 is the chinese character for “one” in case you can’t read the next line
=> “aä¸€”

s.encoding
=> #Encoding:UTF-8

s.length
=> 2

s.count("^a")
=> 0

Why the above result is 0 not 1? After all there are 2 characters
in the string s. Is this a bug of String#count?

Thanks in advance.
Ruohao

lirh · August 19, 2010, 5:51pm

s = “a\u4e00”

s.count("^a")
=> 0

Why the above result is 0 not 1? After all there are 2 characters
in the string s. Is this a bug of String#count?

count returns the sum of occurrences of characters. I don’t see any ^a
in the original string…

lirh · August 19, 2010, 5:56pm

Roger P. wrote:

s = “a\u4e00”

s.count("^a")
=> 0

Why the above result is 0 not 1? After all there are 2 characters
in the string s. Is this a bug of String#count?

count returns the sum of occurrences of characters. I don’t see any ^a
in the original string…

But according to the documentation, “Any other_str that starts with a
caret (^)
is negated”, thus the following behavior:
$ irb --simple-prompt

s = “abc”
=> “abc”

s.count("^a")
=> 2
There are two characters in s that is not “a”.

lirh · August 19, 2010, 5:58pm

Ruohao Li wrote:

Roger P. wrote:

s = “a\u4e00”

s.count("^a")
=> 0

Why the above result is 0 not 1? After all there are 2 characters
in the string s. Is this a bug of String#count?

count returns the sum of occurrences of characters. I don’t see any ^a
in the original string…

But according to the documentation, “Any other_str that starts with a
caret (^)
is negated”, thus the following behavior:

Probably because non-ASCII aren’t counted as /\w/ matching anymore. You
might want to ping core to see if it is expected or not.