Why doesn't Timeout work with String#=~

robtotheb · November 29, 2006, 6:26pm

Sorry for the length, but I hope that will a running test case,
someone can explain this one. I’m running this on Mac OS X 10.4.8
and with ruby 1.8.4

Apple’s ruby is still present, but shouldn’t be used
rab:ruby $ whereis ruby
/usr/bin/ruby
rab:ruby $ /usr/bin/ruby -v
ruby 1.8.2 (2004-12-25) [universal-darwin8.0]

rab:ruby $ which ruby
/usr/local/bin/ruby
rab:ruby $ ruby -v
ruby 1.8.4 (2005-12-24) [i686-darwin8.5.2]
rab:ruby $ /usr/bin/env ruby -v
ruby 1.8.4 (2005-12-24) [i686-darwin8.5.2]

rab:ruby $ uname -mrsv
Darwin 8.8.1 Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT
2006; root:xnu-792.13.8.obj~1/RELEASE_I386 i386

The rest of the email should be runnable as a test case to show the
problem. Disable the test_the_bad_regexp or uncomment the sleep
within it to run without any intervention.

The comments in the test case tell the rest of the story…

-Rob

#!/usr/bin/env ruby -w

I ran into trouble running a test on the result of a spell-checking

action

with a combination of a poorly formed regular expression and having

a third

misspelled word in the test phrase. It was taking a LONG time (not

completed overnight!) so I thought I’d put a timeout around the

call so even

a truly gastly expression would be halted after some reasonalbe

amount of

time.

However, it appears that the regular expression match with =~

doesn’t get

interrupted by the timeout. Is this expected? I found a thread about

timeout and syscalls

(http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/

that

says that syscalls are interrupted which seems to imply that things

like

String#=~ ought to timeout also.

Here’s a “minimal” test that pulls the actual response out (so I

don’t have

to try and recreate a string that will cause the regexp to churn for a

long while).

require ‘test/unit’
require ‘timeout’

class TimeoutRegexpTest < Test::Unit::TestCase
include Timeout

def setup
# Note that this actually has three: 1) prhase => phrase
# 2) twoo => two
# 3) incorrectlly => incorrectly
# so the regexp (which expects only 1 and 2) doesn’t match…
@phrase = “This prhase has only twoo incorrectlly spelled words.”
@result = <<RESULT
This prhase

praise

prose
prates
Pres
praises
pres
paras
prays
prams
prats
preses
proses
Prissie
presser
pressie
presses
pries
peruse
press
prance
Pr's
Pris
pros
pariahs
prices
prides
primes
prizes
probes
proles
protease
proves
prudes
prunes
parades
pirates
prissy
precise
preface
preheats
premise
profuse
promise
propose
rehearse
pares
peres
pores
pyres
Price
Pru's
Pryce
preys
price
prize
prosy
prows
para's
preps
prigs
prods
profs
proms
props
piranhas
purees
Prensa
Ypres
praise's
pram's
prat's
Peria's
Prince
prince
pros's
prig's
Priam's
pariah's
prose's
Prue's
prey's
prow's
pyre's
Pren's
Prut's
prep's
prof's
prom's
piranha's
parade's
Prague's
Brahe's
pirate's
pride's
prude's
puree's
Price's
Pryce's
price's
prize's

has only twoo

two
too
woo
twos
two's

incorrectlly

incorrectly
incorrect
indirectly
uncorrectable
uncorrected
incorrigibly

spelled words. RESULT @result.chomp! # This is "bad" because it backtracks so much trying not to be greedy (*?) @bad_re = %r{\AThis prhase

.*?

has only twoo

.*?

incorrectlly spelled words.\z}ms # This is "good" because it's fast enough to know there is no match @good_re = %r{\AThis prhase

[^<]*

has only twoo

[^<]*

incorrectlly spelled words.\z}ms end

def test_a_good_regexp
assert_nothing_raised Timeout::Error do
timeout(2) do
assert_nil(@result =~ @good_re)
end
end
end

def test_a_timeout
assert_raise Timeout::Error do
timeout(2) do
sleep 10
flunk ‘What about the timeout?’
end
end
end

def test_the_bad_regexp
delay=5
print “(timeout is #{delay}: interupt sooner and test fails,\n”
print " count to #{2*delay} and then interupt it and the test
PASSES!?)"
$stdout.flush

 assert_raise Timeout::Error do
   timeout(delay) do
     # uncomment the sleep and see it timeout "properly"
     # sleep 2*delay
     assert_send [ @result, :=~, @bad_re ], "bad test or bad

result?? "
end
end
print "If there’s a dot I pass => "; $stdout.flush
end
end
END

Rob B. http://agileconsultingllc.com
[email protected]

robtotheb · November 30, 2006, 6:40am

On Nov 29, 2006, at 0924 , Rob B. wrote:

Here’s a “minimal” test that pulls the actual response out (so I

don’t have

to try and recreate a string that will cause the regexp to churn

for a

long while).

Timeout uses threads and threads can only be switched when evaluating
ruby.

String#=~ is written in C and can’t be interrupted by ruby’s thread
scheduler.

–
Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!