Nested threading? implications to timeout()

boing · September 16, 2006, 5:14am

All,

I’m certainly not a ruby expert by any means. I’m trying to come up to
speed as fast as possible on ruby and rails. My last few months have
been reading, coding, reading, more reading, more coding, etc. So I
was doing some coding to try to come up to speed on ruby. I really
think it’s one of the most phenominal languages I’ve ever found. I
never got into java, too verbose. That said I’d like to move on to the
question.

At this point, I’m trying to track down some sort of memory leak. In my
vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn’t actually kill any threads that were started inside that thread.
To clarify, consider the following:

irb(main):001:0> x=nil ; y = Thread.start { x = Thread.start { sleep 0.5
until false } ; sleep 0.5 until false }
=> #<Thread:0xb7abe20c run>
irb(main):002:0> x.status
=> “sleep”
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
irb(main):007:0> x.status
=> “sleep”
irb(main):008:0>

Clearly by killing, what I would call the “parent thread”, did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn’t complete in the right amount of time. So if I were to wrap
anything in “timeout() {}” it’s possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal? Can this cause memory leak? Am I confused?

Thanks much for your help and feedback,

Geff

boing · September 16, 2006, 9:40am

On 9/15/06, Geff G. [email protected] wrote:

At this point, I’m trying to track down some sort of memory leak. In my
=> “sleep”
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
[There are lines missing here…]
time. Is this normal?
Reading the timeout code I see that it creates a watcher thread to
provide the timeout. The block is executed on the current thread and
interruption is provided by raising an exception (from the watcher
tread) when the timeout expires. At the end of the method there is an
ensure to kill the watcher thread in any case so it would not
encounter the case you describe.

Can this cause memory leak?

Due do what I described in the previous paragraph nested timeouts
won’t cause any kind of leak.

Am I confused?

Yes, I believe you are confused.

By the way, reading the code of the methods you have doubts about can
be a good way to clarify something.

boing · September 16, 2006, 4:42pm

On 9/15/06, Geff G. [email protected] wrote:

vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn’t actually kill any threads that were started inside that thread.
To clarify, consider the following:

Threads don’t have any kind of parent-child relationship, as processes
do.
None of the threading models that have seen wide use have a feature like
this. What was your basis for thinking that they do?

boing · September 17, 2006, 1:39am

Arnaud B. wrote:

On 9/15/06, Geff G. [email protected] wrote:

At this point, I’m trying to track down some sort of memory leak. In my
=> “sleep”
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
[There are lines missing here…]

sorry that was just me repeating the status method multiple times.

boing · September 17, 2006, 1:47am

Francis C. wrote:

On 9/15/06, Geff G. [email protected] wrote:

vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn’t actually kill any threads that were started inside that thread.
To clarify, consider the following:

Threads don’t have any kind of parent-child relationship, as processes
do.
None of the threading models that have seen wide use have a feature like
this. What was your basis for thinking that they do?

What was I thinking was the following: lets consider the case where you
try to timeout a HTTP connect with the net/http stuff. There are
already “timeouts” in the HTTP class. If my timeout times out before
the library’s timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it’s done.
Maybe not such a big deal in thise case but the situation certainly
could have been worse. It could be kind of messy if you’re using a
library in general because you don’t know if it’s going ot call timeout
of it’s own and how that might impact resource allocation/deallocation.
As far as the programmer is concerned, he / she timed out the http class
and it should be (all) gone. But it’s not. It’s still hanging around.
Just like my nested sleep was.

I don’t have any basis for thinking there is a parent / child
relationship. Just sorta counter intuitive in the case of a nested
timeout. Because what programmer probably wants is a timeout on the
thing that they are trying to timeout. The WHOLE thing, not part of
it, without verifying the libraries don’t call timeout themselves.

Thoughts?

Seprately I’m having some issue with some of my activerecord objects
sticking around in a long running deamon. Not all, but some. It’s odd.
I haven’t found the problem yet, and I really don’t know what’s going
on. It’s probably not timeout related because those would free up
eventually.

Thanks so much for the comments thus far.

Geff

boing · September 17, 2006, 2:34am

Francis C. wrote:

On 9/16/06, Geff G. [email protected] wrote:

What was I thinking was the following: lets consider the case where
you
try to timeout a HTTP connect with the net/http stuff. There are
already “timeouts” in the HTTP class. If my timeout times out before
the library’s timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it’s done.

I haven’t read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you’ll kill the thread that is doing the TCP
connect, because it’s the same thread that called net/http. The
“internal” timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there’s no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

I think if I timeout a timeout that my timeout is killing the “waiter”
thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?

boing · September 17, 2006, 1:59pm

On 9/16/06, Geff G. [email protected] wrote:

I think if I timeout a timeout that my timeout is killing the
“waiter”
thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?

I don’t think so. Try the following:

require ‘timeout’
begin
timeout(2) {
timeout(4) {
sleep 6
p “I finished”
}
}
rescue Timeout::Error
p “timeout fired”
end

sleep 10

boing · September 17, 2006, 2:29pm

On 9/17/06, Francis C. [email protected] wrote:

require ‘timeout’

sleep 10

You’ll see the timeout message after two seconds, and the program will
end ten seconds later without saying “I finished.” You should be able
to ignore the fact that timeout is implemented with threads in the
first place. Or were you asking a different question?

FYI: Eric H. has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

boing · September 17, 2006, 2:19am

On 9/16/06, Geff G. [email protected] wrote:

What was I thinking was the following: lets consider the case where
you
try to timeout a HTTP connect with the net/http stuff. There are
already “timeouts” in the HTTP class. If my timeout times out before
the library’s timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it’s done.

I haven’t read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you’ll kill the thread that is doing the TCP
connect, because it’s the same thread that called net/http. The
“internal” timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there’s no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

boing · September 17, 2006, 6:05pm

Jan S. wrote:

FYI: Eric H. has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

Wow! Thanks very much.

boing · September 17, 2006, 6:04pm

Francis C. wrote:

require ‘timeout’
begin
timeout(2) {
timeout(4) {
sleep 6
p “I finished”
}
}
rescue Timeout::Error
p “timeout fired”
end

sleep 10

Ur very right. Thanks very much. Sorry I had a bad case of HUA. I
read the timeout library code backwards, so to speak. As if the child
thread (so to speak) was the actual code block, instead of just the
sleep and the raise.

Sorry for posting something I shouldn’t have wasted your time with.
Thanks for the patience and the help.

Geff

boing · September 17, 2006, 6:56pm

On 9/16/06, Francis C. [email protected] wrote:

call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you’ll kill the thread that is doing the TCP
connect, because it’s the same thread that called net/http. The
“internal” timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there’s no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

The timer thread won’t even remain because of the ensure at the end of
the timeout function that kills it …

boing · September 17, 2006, 10:08pm

On Mon, 18 Sep 2006, Joel VanderWerf wrote:

I wish I had an easy answer to this problem, but I don’t. Are timeout and
ensure inherently incompatible?

I don’t have an answer, but what if you make your ensure section into a
critical section?

ensure
Thread.critical = true

do stuff

Thread.critical = false
end

Kirk H.

boing · September 17, 2006, 9:36pm

Jan S. wrote:
…

FYI: Eric H. has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout

The second part of the article points out a potential danger of the
interaction between timeouts and rescue/ensure. But the proposed
solution may not be good advice.

The danger is that a timeout exception may fire during an ensure clause,
preventing the ensure clause (and any necessary cleanup code within it)
from finishing, which could cause resource leaks or other problems.

The solution proposed in the article is to wrap code within the ensure
clause in a begin…end block to handle timeout exceptions, like this:

require ‘timeout’

Timeout.timeout 2 do
begin
puts “Allocating the thingy…”
sleep 1
raise RuntimeError, ‘Oh no! Something went wrong!’
ensure
# Since we might time out, hold onto the timeout we caught
# so we can re-raise it when we’re done cleaning up.
timeout = nil
begin # we really need to clean up
puts “Cleaning up after the thingy…”
sleep 2
puts “Cleaned up after the thingy!”
rescue Timeout::Error => e
puts “Timed out! Trying again!”
timeout = e # save that timeout then retry
retry
end
# Raise the timeout so we time out all the way to the top.
raise timeout unless timeout.nil?
end
end

However, that solution only reduces the chance of the timeout
interfering with the ensure clause. Suppose the timeout fires while the
main thread is executing the line “timeout = nil”. (It’s impossible in
this example, but you can get it to happen by putting a “sleep 5” just
after this line.) Then the inner begin…end clause doesn’t catch the
timeout, and the cleanup doesn’t happen. In general, unless you are very
sure about the timings of your code, you cannot guarantee that the
timeout won’t fire at the wrong time. So it’s a race condition.

Another problem is that the “retry” may cause the cleanup to happen
twice, if the timeout fires just when cleanup is finishing. That may
lead to other problems (such as closing a file twice and generating
another exception).

A solution in this case (but not in general) is to move the ensure block
outside of the timeout block:

require ‘timeout’

begin
Timeout.timeout 2 do
puts “Allocating the thingy…”
sleep 1
raise RuntimeError, ‘Oh no! Something went wrong!’
end

rescue Timeout::Error => e
puts “timed out”

ensure
puts “Cleaning up after the thingy…”
sleep 2
puts “Cleaned up after the thingy!”
end

The effect of this is to kill the timeout thread before the ensure block
starts (since the timeout block has been exited). Also, the ensure code
will be called exactly once (assuming there are no other interrupts).

But this refactoring technique cannot be used if the ensure clause
occurs in some nested method call. This has been noted before:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/113417

For example:

require ‘timeout’

def do_some_work
puts “do_some_work is starting”
sleep 1
raise ‘exception in do_some_work’
ensure
puts “cleaning up resources in do_some_work”
sleep 2
puts “cleaned up resources in do_some_work”
end

Timeout.timeout 2 do
do_some_work
end

This outputs:

do_some_work is starting
cleaning up resources in do_some_work
/usr/local/lib/ruby/1.8/timeout.rb:54:in `do_some_work’: execution
expired (Timeout::Error)

The cleanup never finishes.

We could modify do_some_work to handle timeouts, as Eric did in his
article, but then there would still be the race condition that would
(depending on timing) allow the timeout to kill the ensure clause
anyway. And there’s still the retry problem (causing multiple executions
of cleanup code). And it’s not easy to modify every library method in
this way.

I wish I had an easy answer to this problem, but I don’t. Are timeout
and ensure inherently incompatible?

boing · September 18, 2006, 12:28am

Arnaud B. wrote:

ensure
Thread.critical = true

do stuff

Thread.critical = false
end

Kirk H.

There is still the race problem if the exception gets raised just
before the Thread.critical.

Confirmed by experiment with ruby 1.8.4:

$ cat ensure-timeout.rb
require ‘timeout’

Thread.abort_on_exception = true

work_thread = Thread.new do
loop do
begin
Thread.pass
ensure
Thread.critical = true
begin
rescue Timeout::Error
puts “caught a Timeout::Error”
end
Thread.critical = false
#retry
end
end
end

loop do
work_thread.raise Timeout::Error
end

$ ruby ensure-timeout.rb
ensure-timeout.rb:22: Timeout::Error (Timeout::Error)
from ensure-timeout.rb:6
from ensure-timeout.rb:5

boing · September 17, 2006, 11:40pm

On 9/17/06, [email protected] [email protected] wrote:

do stuff

Thread.critical = false
end

Kirk H.

There is still the race problem if the exception gets raised just
before the Thread.critical.

I’m not sure if it is really possible in this case. It depends if the
implentation allows thread switching just after entering an ensure but
before executing any code. You will have to ask gurus on this one.

boing · September 18, 2006, 12:30am

[email protected] wrote:

do stuff

Thread.critical = false
end

Even if this worked reliably, using a critical section would be
udesirable if the ensure clause invokes IO (e.g. waiting for some
network operation to finish). That would freeze the whole process.

boing · September 18, 2006, 9:34pm

On Sep 16, 2006, at 12:40 AM, Arnaud B. wrote:

time. Is this normal?
Due do what I described in the previous paragraph nested timeouts
won’t cause any kind of leak.

Nested timeouts have a different problem:

http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-
timeout-timeout

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

boing · September 19, 2006, 3:01am

On 9/18/06, Eric H. [email protected] wrote:

ensure

do stuff

Thread.critical = orig_crit
end

If you may already be inside a critical section.

(or require ‘thread’ use Thread.exclusive)

This still leaves the race problem. If you can raise an exception
between an ensure and its first instruction, NOTHING can be safe in
the ensure clause.

To solve this problem would require some changes to the language
semantics. At the very least external exceptions should not be
accepted when inside an ensure clause. This would require some
interpreter changes and certainly more thougth than what I have given
it.

boing · September 18, 2006, 9:40pm

On Sep 17, 2006, at 1:08 PM, [email protected] wrote:

do stuff

Thread.critical = false
end

Careful, you may want:

ensure
orig_crit = Thread.critical
Thread.critical = true

do stuff

Thread.critical = orig_crit
end

If you may already be inside a critical section.

(or require ‘thread’ use Thread.exclusive)

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com