Issue with regular expressions and locales

Hi there!
I’ve experienced a courious issue with regular expressions: my locale is
es_ES.utf-8 and I use gsub with regular expresisons to transform an
article
title in a permalink like this

permalink.gsub!(/\W/, ‘-’)

What’s the problem? In my local system I get this

“¿” =~ /\W/
=> 0

but in another systems with english locale I have this other result

“¿” =~ /\W/
=> nil

Is this a bug or regular expressions matching depends on system locale?

Thanks.

but in another systems with english locale I have this other result

Same version of Ruby?

See also Win32 ruby1.9 regexp and cyrillic string - Ruby - Ruby-Forum perhaps.
-r

Emili Parreño wrote:

In my local system I get this

“¿” =~ /\W/
=> 0

but in another systems with english locale I have this other result

“¿” =~ /\W/
=> nil

Is this a bug or regular expressions matching depends on system locale?

(1) What exact version(s) of Ruby are you running? (Show
RUBY_DESCRIPTION constant). Behaviour varies between versions.

(2) What does

"¿".encoding

show on the two machines?

AFAIK the actual match should depend only on the encoding of the string,
not the system locale in the environment - but if you find differently
that would be of interest.

(3) It looks like you are doing this in IRB. IRB is not a good predictor
of behaviour in ruby 1.9, since the encoding of string literals in IRB
depends on the system locale - which is not true for ruby source code in
source files.

So writing a small test .rb file and running that is probably better.

I’ve attempted to document what I’ve found so far about encoding
behaviour in 1.9 here: