Why doesn't Timeout work with String#=~

Sorry for the length, but I hope that will a running test case,
someone can explain this one. I’m running this on Mac OS X 10.4.8
and with ruby 1.8.4

Apple’s ruby is still present, but shouldn’t be used
rab:ruby $ whereis ruby
/usr/bin/ruby
rab:ruby $ /usr/bin/ruby -v
ruby 1.8.2 (2004-12-25) [universal-darwin8.0]

rab:ruby $ which ruby
/usr/local/bin/ruby
rab:ruby $ ruby -v
ruby 1.8.4 (2005-12-24) [i686-darwin8.5.2]
rab:ruby $ /usr/bin/env ruby -v
ruby 1.8.4 (2005-12-24) [i686-darwin8.5.2]

rab:ruby $ uname -mrsv
Darwin 8.8.1 Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT
2006; root:xnu-792.13.8.obj~1/RELEASE_I386 i386

The rest of the email should be runnable as a test case to show the
problem. Disable the test_the_bad_regexp or uncomment the sleep
within it to run without any intervention.

The comments in the test case tell the rest of the story…

-Rob

#!/usr/bin/env ruby -w

I ran into trouble running a test on the result of a spell-checking

action

with a combination of a poorly formed regular expression and having

a third

misspelled word in the test phrase. It was taking a LONG time (not

completed overnight!) so I thought I’d put a timeout around the

call so even

a truly gastly expression would be halted after some reasonalbe

amount of

time.

However, it appears that the regular expression match with =~

doesn’t get

interrupted by the timeout. Is this expected? I found a thread about

timeout and syscalls

(http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/

  1. that

says that syscalls are interrupted which seems to imply that things

like

String#=~ ought to timeout also.

Here’s a “minimal” test that pulls the actual response out (so I

don’t have

to try and recreate a string that will cause the regexp to churn for a

long while).

require ‘test/unit’
require ‘timeout’

class TimeoutRegexpTest < Test::Unit::TestCase
include Timeout

def setup
# Note that this actually has three: 1) prhase => phrase
# 2) twoo => two
# 3) incorrectlly => incorrectly
# so the regexp (which expects only 1 and 2) doesn’t match…
@phrase = “This prhase has only twoo incorrectlly spelled words.”
@result = <<RESULT
This prhase

  • praise
  • prose
  • \n
  • prates
  • \n
  • Pres
  • \n
  • praises
  • \n
  • pres
  • paras
  • \n
  • prays
  • \n
  • prams
  • \n
  • prats
  • \n
  • preses
  • proses
  • \n
  • Prissie
  • \n
  • presser
  • \n
  • pressie
  • presses
  • \n
  • pries
  • \n
  • peruse
  • \n
  • press
  • prance
  • \n
  • Pr's
  • \n
  • Pris
  • \n
  • pros
  • \n
  • pariahs
  • prices
  • \n
  • prides
  • \n
  • primes
  • \n
  • prizes
  • probes
  • \n
  • proles
  • \n
  • protease
  • \n
  • proves
  • prudes
  • \n
  • prunes
  • \n
  • parades
  • \n
  • pirates
  • prissy
  • \n
  • precise
  • \n
  • preface
  • \n
  • preheats
  • premise
  • \n
  • profuse
  • \n
  • promise
  • \n
  • propose
  • rehearse
  • \n
  • pares
  • \n
  • peres
  • \n
  • pores
  • pyres
  • \n
  • Price
  • \n
  • Pru's
  • \n
  • Pryce
  • \n
  • preys
  • price
  • \n
  • prize
  • \n
  • prosy
  • \n
  • prows
  • \n
  • para's
  • preps
  • \n
  • prigs
  • \n
  • prods
  • \n
  • profs
  • \n
  • proms
  • props
  • \n
  • piranhas
  • \n
  • purees
  • \n
  • Prensa
  • Ypres
  • \n
  • praise's
  • \n
  • pram's
  • \n
  • prat's
  • Peria's
  • \n
  • Prince
  • \n
  • prince
  • \n
  • pros's
  • prig's
  • \n
  • Priam's
  • \n
  • pariah's
  • \n
  • prose's
  • Prue's
  • \n
  • prey's
  • \n
  • prow's
  • \n
  • pyre's
  • Pren's
  • \n
  • Prut's
  • \n
  • prep's
  • \n
  • prof's
  • prom's
  • \n
  • piranha's
  • \n
  • parade's
  • \n
  • Prague's
  • Brahe's
  • \n
  • pirate's
  • \n
  • pride's
  • \n
  • prude's
  • puree's
  • \n
  • Price's
  • \n
  • Pryce's
  • \n
  • price's
  • prize's
has only twoo
  • two
  • too
  • \n
  • woo
  • \n
  • twos
  • \n
  • two's
incorrectlly
  • incorrectly
  • incorrect
  • \n
  • indirectly
  • \n
  • uncorrectable
  • \n
  • uncorrected
  • incorrigibly
spelled words. RESULT @result.chomp! # This is "bad" because it backtracks so much trying not to be greedy (*?) @bad_re = %r{\AThis prhase
    (?:
  • .*?
  • \n?)+
has only twoo
    (?:
  • .*?
  • \n?) +
incorrectlly spelled words.\z}ms # This is "good" because it's fast enough to know there is no match @good_re = %r{\AThis prhase
    (?:
  • [^<]*
  • \n?)+
has only twoo
    (?:
  • [^<]*
  • \n?) +
incorrectlly spelled words.\z}ms end

def test_a_good_regexp
assert_nothing_raised Timeout::Error do
timeout(2) do
assert_nil(@result =~ @good_re)
end
end
end

def test_a_timeout
assert_raise Timeout::Error do
timeout(2) do
sleep 10
flunk ‘What about the timeout?’
end
end
end

def test_the_bad_regexp
delay=5
print “(timeout is #{delay}: interupt sooner and test fails,\n”
print " count to #{2*delay} and then interupt it and the test
PASSES!?)"
$stdout.flush

 assert_raise Timeout::Error do
   timeout(delay) do
     # uncomment the sleep and see it timeout "properly"
     # sleep 2*delay
     assert_send [ @result, :=~, @bad_re ], "bad test or bad

result?? "
end
end
print "If there’s a dot I pass => "; $stdout.flush
end
end
END

Rob B. http://agileconsultingllc.com
[email protected]

On Nov 29, 2006, at 0924 , Rob B. wrote:

Here’s a “minimal” test that pulls the actual response out (so I

don’t have

to try and recreate a string that will cause the regexp to churn

for a

long while).

Timeout uses threads and threads can only be switched when evaluating
ruby.

String#=~ is written in C and can’t be interrupted by ruby’s thread
scheduler.


Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!