Strange unicode regex behavior with Ruby 2.0


the following code behaves strangely in ruby 2.0, and different from

puts “ä”.match(/[\p{Word}]/).inspect
puts “ä”.match(/[\p{Word}\s]/).inspect

Result on 1.9.3-p194:
#<MatchData “ä”>
#<MatchData “ä”>

Result on 2.0.0-p0:
#<MatchData “ä”>

Any ideas what’s going on there?

I have attached the ruby code as a file, in case there are any problems
with email charset conversion.


If a bug had been inserted, it appears it has been removed already:

$ ruby -v
ruby 2.1.0dev (2013-03-25 trunk 39928) [x86_64-linux]
$ ruby test_regex.rb
#<MatchData “ä”>
#<MatchData “ä”>


Hey Andreas,

Lately, there have been some discussions on ruby-core (the mailing list
dedicated to the core implementers of MRI). It’s possible that this bug
being adressed at the moment. These are the most recent messages there:

I don’t really know what could be causing this difference. :frowning:

Carlos A.
Skype: carlos.agarie

Control engineering
Polytechnic School, University of So Paulo, Brazil
Computer engineering
Embry-Riddle Aeronautical University, USA

2013/3/26 Andreas S. [email protected]

Thanks! Guess I will use a workaround for now and wait for 2.1.