Replacing the use of gettimeofday in the scheduler

Tomas_P · March 1, 2007, 3:46pm

using gettimeofday in the scheduler is problematic, since it’s possible
that
the system time will jump ahead or back because of the user or ntp
resetting the
time [1]. This can have side effects such as sleep() and timeout() never
returning and thus threads not ever being scheduled again and seems to
have
also other side effects [3].

Eric H. is arguing [2] that replacing the existing mechanism that
uses
libc-select to sleep and getimeofday to calculate the effectively
elapsed time
by libc-sleep is also problematic because:

“[for libc-sleep]… system activity may lengthen the sleep by an
indeterminate
amount.”

However, this applies in exactly the same way to libc-select as well and
thus
replacing the select/gettimeofday mechanism by libc-sleep should at
least work
no worse. Objections?

Has there been any effort to implement a solution based on sleep/usleep?
Is the
interest to implement a more robust schedule timing mechanism? Is there
a
chance for a patch based on sleep/usleep to make it into CVS?
*t

[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/103140
[2] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/103245
[3] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/229829

Tomas_P · March 1, 2007, 4:12pm

On 3/1/07, Tomas P. [email protected] wrote:

However, this applies in exactly the same way to libc-select as well and thus
replacing the select/gettimeofday mechanism by libc-sleep should at least work
no worse. Objections?

My first reaction was: good god, the scheduler uses wallclock time?!
Speaking as someone who works on realtime systems (and thus has to
think about scheduler implementation often), this is never a good
idea. I don’t know the background for Ruby’s scheduler design, but
normally I’d regard a scheduler which uses wallclock time as just
plain broken. I’m heartily in favor of changing it to something
which isn’t dependent on the clock.

That “indeterminate amount” referenced above is simply the price you
pay for running in userspace ion a modern multitasking OS. Yes,
system activity could delay the return. That’s what multitasking
means: you don’t get to choose when you get the CPU. In practice, if
applications are experiencing unacceptable latency in OS scheduling
then 1) your gettimeofday()-based implementation is going to be
delayed right along with everything else; and 2) you have bigger
problems, because your system is overloaded.

Cheers,

Tomas_P · March 1, 2007, 5:10pm

Quoting Tomas P. [email protected]:

Has there been any effort to implement a [scheduling] solution based on
sleep/usleep? Is the interest to implement a more robust schedule timing
mechanism? Is there a chance for a patch based on sleep/usleep to make it
into CVS?

gnu-libc’s sleep(3) manpage suggest that sleep and SIGALRM on non-glibc
systems
don’t get along. From the POSIX spec [1]:

"If a SIGALRM signal is generated for the calling process during

execution
of sleep(), except as a result of a prior call to alarm(), and if
the
SIGALRM signal is not being ignored or blocked from delivery, it is
unspecified whether that signal has any effect other than causing
sleep()
to return."

( Thus it is possible that Ruby’s signalhandler for SIGALRM will not
be
executed )

Since Ruby does allow the user to handle SIGALRM that would mean that
an
implementation based on libc-sleep would fail to work correctly on the
above
described systems, that don’t handle SIGALRM together with sleep
gracefully,
when the user is doing stuff with SIGALRM.

Does anybody know how relevant that is? I.e. does Ruby run at all on
such
systems? The above would seem to exclude to implement scheduler waiting
with
libc-sleep since that would prevent correct functioning of Ruby on such
systems
in “corner cases”.

?
*t

[1] sleep

Tomas_P · March 1, 2007, 9:11pm

On Mar 1, 2007, at 07:12, Avdi G. wrote:

idea. I don’t know the background for Ruby’s scheduler design, but
normally I’d regard a scheduler which uses wallclock time as just
plain broken. I’m heartily in favor of changing it to something
which isn’t dependent on the clock.

The Ruby thread scheduler uses setitimer(2) and select(2). It
depends on the wall-clock for implementing features defined in terms
of the wall-clock (Kernel#sleep and Thread#join).

That “indeterminate amount” referenced above is simply the price you
pay for running in userspace ion a modern multitasking OS. Yes,
system activity could delay the return. That’s what multitasking
means: you don’t get to choose when you get the CPU. In practice, if
applications are experiencing unacceptable latency in OS scheduling
then 1) your gettimeofday()-based implementation is going to be
delayed right along with everything else; and 2) you have bigger
problems, because your system is overloaded.

Kernel#sleep behaves differently in Ruby programs using threads. If
you sleep in a thread you end up context switching to other threads
instead of calling sleep(3).

Since you aren’t using sleep(3) in threaded mode, Ruby instead uses
gettimeofday(2) to implement Kernel#sleep for the calling thread (has
this thread slept its N seconds?), so you may sleep longer than you
expect.

The other place gettimeofday(2) is used is Thread#join’s timeout, for
similar reason.

Tomas_P · March 1, 2007, 10:16pm

On Fri, 2 Mar 2007, Avdi G. wrote:

On 3/1/07, Eric H. [email protected] wrote:

The Ruby thread scheduler uses setitimer(2) and select(2). It
depends on the wall-clock for implementing features defined in terms
of the wall-clock (Kernel#sleep and Thread#join).

Thanks for the explanation. I’m probably missing something, I’m
confused by why the functionality you describe in Kernel#sleep and
Thread#join can’t be implemented using only select(). Can you clarify?

It is implemented using select, but select is, per spec, allowed to
return before the time’s up. So rb_thread_wait_for(time) is (indirectly)
using gettimeofday to find out how much time has gone by. And if

(diff = (gettimeofday_now - gettimeofday_before_we_called_select) > 0 )

then rb_thread_wait_for reiterates and sleeps (with select) again.

I can see the following solutions:

* find a reliable time source that works cross-platform. uptime and
  ticks would be candidates, but I haven't found a way to have them
  cross-platform.
* use thread_timer as a reliable time source

*t

–

Tomas_P · March 1, 2007, 9:16pm

On 3/1/07, Eric H. [email protected] wrote:

The Ruby thread scheduler uses setitimer(2) and select(2). It
depends on the wall-clock for implementing features defined in terms
of the wall-clock (Kernel#sleep and Thread#join).

Thanks for the explanation. I’m probably missing something, I’m
confused by why the functionality you describe in Kernel#sleep and
Thread#join can’t be implemented using only select(). Can you clarify?

Thanks,

Tomas_P · March 1, 2007, 10:19pm

On Fri, 2 Mar 2007, Avdi G. wrote:

plain broken. I’m heartily in favor of changing it to something
which isn’t dependent on the clock.

Do you know of any cross-platform aka POSIX call where the returned time
is strictly increasing? How do other green thread implementations handle
this problem?
*t

–

Tomas_P · March 1, 2007, 10:30pm

On Fri, 2 Mar 2007, Eric H. wrote:

normally I’d regard a scheduler which uses wallclock time as just
plain broken. I’m heartily in favor of changing it to something
which isn’t dependent on the clock.

The Ruby thread scheduler uses setitimer(2) and select(2). It depends on the
wall-clock for implementing features defined in terms of the wall-clock
(Kernel#sleep and Thread#join).

You need to add Timeout#timeout to this. But:

$ ri Kernel#sleep

  Suspends the current thread for _duration_ seconds (which may be
  any number, including a +Float+ with fractional seconds). Returns
  the actual number of seconds slept (rounded), which may be less
  than that asked for if another thread calls +Thread#run+. Zero
  arguments causes +sleep+ to sleep forever.

No reference to wall-clock in there. What do you mean by “defined in
terms
of the wall-clock”?

sleep in a thread you end up context switching to other threads instead of
calling sleep(3).

Since you aren’t using sleep(3) in threaded mode, Ruby instead uses
gettimeofday(2) to implement Kernel#sleep for the calling thread (has this
thread slept its N seconds?), so you may sleep longer than you expect.

The other place gettimeofday(2) is used is Thread#join’s timeout, for similar
reason.

The problem is that when you set system time into the past by a month,
then your thread will also sleep for a month and not, as you probably
expected, only a few seconds. Which is actually the hint for the
solution… to be followed.

*t

–

Tomas_P · March 1, 2007, 11:20pm

On Mar 1, 2007, at 13:30, Tomas P.'s Mailing L. wrote:

On Fri, 2 Mar 2007, Eric H. wrote:

The Ruby thread scheduler uses setitimer(2) and select(2). It
depends on the wall-clock for implementing features defined in
terms of the wall-clock (Kernel#sleep and Thread#join).

You need to add Timeout#timeout to this.

Nope. Timeout calls Kernel#sleep in a thread.

No reference to wall-clock in there. What do you mean by “defined
in terms of the wall-clock”?

When I write “sleep 5” I expect at least five seconds on the clock on
my wall to go by before the next statement is executed.

Tomas_P · March 1, 2007, 10:49pm

On Fri, 2 Mar 2007 06:30:12 +0900, “Tomas P.'s Mailing L.”
[email protected] wrote:

What do you mean by “defined in terms of the wall-clock”?

“wall-clock” refers to real elapsed time, rather than CPU elapsed time.
It’s better to base your scheduler on CPU elapsed time, since on a
heavily loaded system, a “wall-clock”-based scheduler will just thrash
without getting much useful work done.

Since there aren’t widespread standard APIs for CPU-time-based
interrupts, most runtimes with “green thread” schedulers that are based
on CPU time approximate it by counting reductions, VM instructions, or
AST nodes traversed.

-mental

Tomas_P · March 1, 2007, 11:31pm

On 3/1/07, Eric H. [email protected] wrote:

On Mar 1, 2007, at 13:30, Tomas P.'s Mailing L. wrote:

The problem is that when you set system time into the past by a
month, then your thread will also sleep for a month and not, as you
probably expected, only a few seconds. Which is actually the hint
for the solution… to be followed.

I don’t see how this could be confused for a bug in Ruby.

Sounds like a bug to me. Time updates happen, sometimes without user
intervention or knowledge. Software which can glitch or hang up as a
result of this fact isn’t robust.

Tomas_P · March 1, 2007, 11:49pm

On Fri, 2 Mar 2007, Eric H. wrote:

On Mar 1, 2007, at 13:30, Tomas P.'s Mailing L. wrote:

On Fri, 2 Mar 2007, Eric H. wrote:

The Ruby thread scheduler uses setitimer(2) and select(2). It depends on
the wall-clock for implementing features defined in terms of the
wall-clock (Kernel#sleep and Thread#join).

You need to add Timeout#timeout to this.

Nope. Timeout calls Kernel#sleep in a thread.

Um. When I do a “timeout(5) { do_something };” and set the system clock
back by a minute, then the timeout will not ever time out.

No reference to wall-clock in there. What do you mean by “defined in terms
of the wall-clock”?

When I write “sleep 5” I expect at least five seconds on the clock on my wall
to go by before the next statement is executed.

That’s right, However this will not happen, when you change the system
time while the 5 seconds have not gone by.
*t

–

Tomas_P · March 1, 2007, 11:26pm

On Mar 1, 2007, at 13:30, Tomas P.'s Mailing L. wrote:

The problem is that when you set system time into the past by a
month, then your thread will also sleep for a month and not, as you
probably expected, only a few seconds. Which is actually the hint
for the solution… to be followed.

I don’t see how this could be confused for a bug in Ruby.

Tomas_P · March 1, 2007, 11:53pm

Hi,

In message “Re: replacing the use of gettimeofday in the scheduler”
on Fri, 2 Mar 2007 07:31:16 +0900, “Avdi G.” [email protected]
writes:

|Sounds like a bug to me. Time updates happen, sometimes without user
|intervention or knowledge. Software which can glitch or hang up as a
|result of this fact isn’t robust.

If it is a bug, I suspect it’s a bug in POSIX that doesn’t provide any
API for proper “clock” for the purpose. Correct me if I’m wrong.

          matz.

Tomas_P · March 2, 2007, 12:08am

On 3/1/07, Yukihiro M. [email protected] wrote:

If it is a bug, I suspect it’s a bug in POSIX that doesn’t provide any
API for proper “clock” for the purpose. Correct me if I’m wrong.

This might well be. Not being a contributor to the Ruby kernel, I
don’t know what the policy is: does Ruby only implement features which
can be built with pure POSIX, or can they have OS-specific
implementations?

Tomas_P · March 1, 2007, 11:58pm

On Thu, 1 Mar 2007, Tomas P.'s Mailing L. wrote:

Um. When I do a “timeout(5) { do_something };” and set the system clock back
by a minute, then the timeout will not ever time out.

Correction, sorry - it will sleep 1 minute + 5 seconds, so…

No reference to wall-clock in there. What do you mean by “defined in terms
of the wall-clock”?

When I write “sleep 5” I expect at least five seconds on the clock on my
wall to go by before the next statement is executed.

That’s right, However this will not happen, when you change the system time
while the 5 seconds have not gone by.

… it will not sleep as long as the wall-clock. I.e. will not do what I
expect.
*t

–

Tomas_P · March 2, 2007, 12:09am

On Fri, 2 Mar 2007, Yukihiro M. wrote:

API for proper “clock” for the purpose. Correct me if I’m wrong.
I’d say it’s a feature that’s missing in POSIX. However you can hack
around the fact by using system specific calls (on Linux f.ex.
/proc/uptime).

What about using Ruby’s own “thread_timer” as a more or less
accurate time source?
*t

–

Tomas_P · March 2, 2007, 12:14am

HI,

In message “Re: replacing the use of gettimeofday in the scheduler”
on Fri, 2 Mar 2007 08:07:20 +0900, “Avdi G.” [email protected]
writes:

|This might well be. Not being a contributor to the Ruby kernel, I
|don’t know what the policy is: does Ruby only implement features which
|can be built with pure POSIX, or can they have OS-specific
|implementations?

It can have OS-specific implementation, but I want the core behavior
being common on most (if not all) platforms. Besides that, I have no
idea to fix this “bug” on any platform right now. Any idea?

          matz.

Tomas_P · March 2, 2007, 12:38am

On 3/1/07, Yukihiro M. [email protected] wrote:

It can have OS-specific implementation, but I want the core behavior
being common on most (if not all) platforms. Besides that, I have no
idea to fix this “bug” on any platform right now. Any idea?

It looks like sufficiently recent POSIX standards DO have a solution
for this. I’d like to do some more research on this, but right now I
don’t have time. For anyone who wants to take a look at it, here’s a
starting point:
http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html

Pay particular attention to CLOCK_MONOTONIC and the timer_*() functions.

Hopefully I’ll be able to look at this in greater detail over the
weekend.

Tomas_P · March 5, 2007, 5:13pm

Quoting Tomas P. [email protected]:

It can have OS-specific implementation, but I want the core behavior

whether and how hh:mm:ss.mm is calculated from it. This solution has
     meanwhile: elapsed_time = timeofday() - start
to have it. If people want to check about whether their systems provide
 sleep( sleep_time )
clock" work right or do we want the relative “stop watch” to

Opinions? Shall I try to submit a patch?
*t

[1]
clock_getres

It’s funny to note, that the identical problem was recently reported
against
Python [1], that the discussion contains also the two proposed
approaches [2]
and that there’s no decision yet on how to proceed and thus the bug is
still
open. So Ruby is not alone

*t

[1]
https://sourceforge.net/tracker/index.php?func=detail&aid=1607041&group_id=5470&atid=105470
[2]
https://sourceforge.net/tracker/index.php?func=detail&aid=1607149&group_id=5470&atid=305470