Because a regular expression can have different behaviors depending on
its kcode
(e.g. behavior of \w) I decided that all my code should specify the
kcode
explicitly (e.g. /\w+/n instead /\w+/). So I tried to set up some hooks
to
monitor the creation of each Regexp and raise an exception if the kcode
is
missing. Like this:
class Regexp
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
raise “NO KCODE!” if kcode.nil?
end
end
And it works fine if I use Regexp.new, but in the majority of cases the
regexp
is expressed as a literal and the initialize is NOT EXECUTED.
Regexp.new(“foobar”)
RuntimeError: NO KCODE!
/foobar/
=> /foobar/
So I tried an alternate approach and set the hook into the =~ operator,
but same
problem; the method override is completely ignored:
class String; def =~(o); raise “S”; end; end
class Regexp; def =~(o); raise “R”; end; end
“bar” =~ /bar/ #=> 0
/foo/ =~ “foo” #=> 0
So… anyone has any idea how I can tackle that problem?
ruby -v
==> ruby 1.8.4 (2005-12-24) [i486-linux]
class String; def =~(o); raise “S”; end; end
class Regexp; def =~(o); raise “R”; end; end
r = /x/
r =~ ‘a’
==> RuntimeError: R
from (irb):2:in `=~'
from (irb):4
‘a’ =~ r
==> RuntimeError: S
from (irb):1:in `=~'
from (irb):5
On 2/28/07, Daniel DeLorme [email protected] wrote:
raise "NO KCODE!" if kcode.nil?
So I tried an alternate approach and set the hook into the =~ operator, but same
problem; the method override is completely ignored:
class String; def =~(o); raise “S”; end; end
class Regexp; def =~(o); raise “R”; end; end
“bar” =~ /bar/ #=> 0
/foo/ =~ “foo” #=> 0
So… anyone has any idea how I can tackle that problem?
Yes, well no, I had one, but prospects look bleak now, look at this
robert@swserver:/home/svn 11:49:44
555/56 > ruby -r profile -e ‘puts /a/’
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.01 0.00 1 0.00 10.00 #toplevel
robert@swserver:/home/svn 11:49:50
556/57 > ruby -r profile -e ‘puts Regexp.new(“a”)’
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.00 0.00 1 0.00 0.00 Regexp#initialize
0.00 0.00 0.00 1 0.00 0.00 Class#new
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.01 0.00 1 0.00 10.00 #toplevel
I just do not see any way to intercept on Ruby level, you would need
to hack ruby itself.
Maybe someone more clever than me?
Cheers
Robert
Jan F. wrote:
from (irb):4
‘a’ =~ r
==> RuntimeError: S
from (irb):1:in `=~'
from (irb):5
Very interesting. If you assign the regexp to a variable you get
the overridden methods. I guess there’s some voodoo optimization
at work when you use =~ on a regexp literal?
Daniel
Daniel DeLorme wrote:
Because a regular expression can have different behaviors depending on
its kcode (e.g. behavior of \w) I decided that all my code should
specify the kcode explicitly (e.g. /\w+/n instead /\w+/).
As an addendum, I was wondering why \w matches extended characters in
utf8.
If extended characters are considered “word” characters, does it mean
they
are valid for identifiers? So I tried:
$KCODE=‘u’
=> “u”
def 日本語
“nihongo”
end
=> nil
日本語
=> “nihongo”
wow. O_O
Daniel