Sysread changes behavior in the presence of threads?

blackhedd · May 21, 2006, 1:42am

What am I missing here?

require ‘socket’
require ‘fcntl’
Thread.new {sleep 100}
sd = TCPsocket.new( “www.cisco.com”, 80)
m = sd.fcntl( Fcntl::F_GETFL, 0)
sd.fcntl( Fcntl::F_SETFL, Fcntl::O_NONBLOCK | m)
sd.sysread(4096)

This code blocks in the sysread, in effect ignoring the nonblocking mode
set
on the file descriptor. But if you comment out the line that spins the
thread, the sysread raises Errno::EAGAIN as you’d expect.

Is this a defined behavior or a bug?

blackhedd · May 21, 2006, 2:37am

From: “Francis C.” [email protected]

This code blocks in the sysread, in effect ignoring the nonblocking mode set
on the file descriptor. But if you comment out the line that spins the
thread, the sysread raises Errno::EAGAIN as you’d expect.

Is this a defined behavior or a bug?

Hi, I learned about this just the other day. See thread
starting here:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/192008

Regards,

Bill

blackhedd · May 21, 2006, 2:55am

Ahh, thanks for pointing it out, although it ain’t the answer I was
hoping
for. Interesting that your concern (not having to think about EAGAIN) is
opposite from mine- I’m wanting to get EAGAIN so I can do something
else
while waiting for the I/O.
Solved my problem by holding my nose and writing an IO#read_nonblocking
method in C.

blackhedd · May 21, 2006, 3:26am

From: “Francis C.” [email protected]

Ahh, thanks for pointing it out, although it ain’t the answer I was hoping
for. Interesting that your concern (not having to think about EAGAIN) is
opposite from mine- I’m wanting to get EAGAIN so I can do something else
while waiting for the I/O.

Yeah. I didn’t have a specific need to check for EAGAIN at the
time, but my main point was trying to express my confusion that
there would be this one semi-obscure case (single thread only)
that acted differently from the rest. I say obscure because I
don’t think I’d ever write code in ruby to expect that no other
threads were present in the system. How would I know that
some other library I’ve required hasn’t spawned some worker
thread for its own internal use? So having ruby act
inconsistently in the particular case of there being only one
thread alive seems peculiar to me. So my point was I was
confused by that behavior since it seems like something I could
never be able to reliably expect to depend on.

Regards,

Bill

blackhedd · May 21, 2006, 5:11am

In article [email protected],
“Francis C.” [email protected] writes:

I’d be inclined to categorize it as a bug. It violates the expectation that
I/O is orthogonal to threads. I looked at the code in io.c and I can see the
reason for it, but I’m writing a library and I don’t have the luxury of
making assumptions about whether there are other threads.

A workaround is Thread.exclusive { sd.sysread(4096) }.

However I think a nonblocking read method is right way to
fix this issue. The problem is the method name, though.

There are several problems to fix sysread to cause EAGAIN in
I/O multiplex mode. Since a nonblocking I/O doesn’t block,
it is possible to avoid I/O multiplex to avoid entire
process blocking. So if Ruby disables I/O multiplex for
nonblocking I/O, sysread will cause EAGAIN. But,
unfortunately, Ruby cannot know nonblocking state of a fd in
some case.

Windows has no F_GETFL equivalent
There is no way to know the state on Windows.
race condition
Even on an environment which has F_GETFL, the state may be
changed between F_GETFL and read(2) by another process.

blackhedd · May 21, 2006, 7:04am

Tanaka-sensei: from your description it appears that the problem is
caused
by an interaction between the Ruby thread-scheduler and the I/O
functions,
which can’t be resolved without fundamentally changing how the scheduler
works. That’s fair enough.

By this point, I’ve accumulated a small library of functions in C that
perform various operations without blocking, and are unaffected by the
presence of threads. Perhaps I’ll release them as an extension library
so
people can use them now, while recognizing that they will be eventually
obsoleted when a decision is made about the names to be used in the
standard
distro. (Another thing I’d like to do is write a unified Mutex/Condition
Variable implementation that can actually be used to synchronize Ruby
threads with native threads.)

blackhedd · May 21, 2006, 3:32am

I’d be inclined to categorize it as a bug. It violates the expectation
that
I/O is orthogonal to threads. I looked at the code in io.c and I can see
the
reason for it, but I’m writing a library and I don’t have the luxury of
making assumptions about whether there are other threads.

blackhedd · May 21, 2006, 8:14am

In your example, the thread ends almost immediately, especially since
puts
will probably complete long before a connect to a remote web server. So
by
the time the sysread executes, there is only one thread. Tanaka’s
explanation still holds.

I think the answer is to define a new set of functions that can be
depended
on not to block. (And on Windows, they will probably also need to set
the
descriptor nonblocking.) From prior communications with Matz, he’s not
against this, but hasn’t settled yet on what these new methods should be
named.

blackhedd · May 21, 2006, 8:17am

From: “Francis C.” [email protected]

(Another thing I’d like to do is write a unified Mutex/Condition
Variable implementation that can actually be used to synchronize Ruby
threads with native threads.)

That would be awesome!

Can I donate via paypal or something?

Regards,

Bill

blackhedd · May 21, 2006, 7:52am

On 5/21/06, Francis C. [email protected] wrote:

Tanaka-sensei: from your description it appears that the problem is caused
by an interaction between the Ruby thread-scheduler and the I/O functions,
which can’t be resolved without fundamentally changing how the scheduler
works. That’s fair enough.

I am not sure Tanaka’s explanation is quite satisfactory. If you change
your original example thusly:

require ‘socket’
require ‘fcntl’
Thread.new { puts “hi” }
sd = TCPsocket.new( “www.cisco.com”, 80)
m = sd.fcntl( Fcntl::F_GETFL, 0)
sd.fcntl( Fcntl::F_SETFL, Fcntl::O_NONBLOCK | m)
sd.sysread(4096)

it will produce Errno::EAGAIN as expected.

Thus, one is led to suspect that in the former case the thread somehow
does not get properly cleaned up upon completion.

In any case, I agree with you that not being able to count on
non-blocking
behavior, just because there could be some stray Threads around, is
pretty nasty.

-A

blackhedd · May 21, 2006, 8:20am

I started working on it the other day, will complete when I have time.
Hope
you’re not running Windows ;-). Condition variables don’t work perfectly
on
Windows.

blackhedd · May 21, 2006, 8:59am

From: “Francis C.” [email protected]

I started working on it the other day, will complete when I have time. Hope
you’re not running Windows ;-). Condition variables don’t work perfectly on
Windows.

The relevant applications are multi-platform: OS X; Linux; and yes,
Windows.

Your mention of condition variables not working reliably(?) on
Windows has certainly piqued my interest. Are you referring to
Ruby’s condition variables? Or some fundamental Windows flaw?
Our C++ app uses boost::condition variables on all platforms.
Definitely interested to know if there’s some glitch in Ruby
and/or Windows I should be aware of.

Thanks,

Bill

blackhedd · May 21, 2006, 2:15pm

2006/5/21, Francis C. [email protected]:

It’s a Windows issue, nothing to do with Ruby. I haven’t looked at how Boost
implements them, but if I have a chance I will. The problem is with timed
condwaits. As you know Windows uses different synch primitives than Unix.
It’s one of those tiny breaches of atomicity that you’ll occasionally see if
you run enough trials on a machine with enough multiprocessors.

But this does not bite you if you use Ruby’s condition variables as
they are completely in Ruby land and there are no native threads
(well, one native thread is there :-))

This whole thread makes me wonder why at all you need to use sysread
and a non blocking variant of it. Do you have any extensions that use
native threads or what are you trying to accomplish? I’m asking
because for me Ruby’s threads and blocking IO (on Ruby level) have
served me well so far.

Kind regards

robert

blackhedd · May 21, 2006, 1:44pm

It’s a Windows issue, nothing to do with Ruby. I haven’t looked at how
Boost
implements them, but if I have a chance I will. The problem is with
timed
condwaits. As you know Windows uses different synch primitives than
Unix.
It’s one of those tiny breaches of atomicity that you’ll occasionally
see if
you run enough trials on a machine with enough multiprocessors. Once I
spent
a Saturday morning trying to write a proper condvar for Windows in
assembler
but I gave up when it occurred to me that I really should get a life
instead.

blackhedd · May 21, 2006, 2:46pm

I’m working on the eventmachine library (see rubyforge). The goal is
to enable complicated applications (including multiplayer games and
network servers) that are far faster and more scalable than is
possible with “ordinary” Ruby coding. (That is, without requiring deep
understanding of the Ruby runtime environment in order to get the
required performance.) Practically speaking, this requires strict
nonblocking i/o and a certain amount of C extensions. We’d like for
any Ruby programmer to be able to write a large, fast application
without acquiring expertise in concurrency and networking issues.

As an example, I’m responsible for an LDAP system with five replicated
servers running simultaneously and sharing load. This system now can
sustain rates of 2000 queries per second per server (directory size is
about one million entries), but I had to resort to a single-threaded
server handwritten in C++ (openldap’s performance on the specified
hardware was about one-twentieth of the requirement). The replication
code is almost all in Ruby. I’d like to have the main server code be
largely in Ruby so it will be easier to maintain. That’s an example of
what I want to do.

As far as threads are concerned: I’m one of those people who believe
that threads are seriously overused and should be avoided, especially
in high-performance applications. But occasionally if you’re mixing
Ruby and native code, you may have threads in each. As long as Ruby’s
threads are green, this split will exist, and it would be nice to able
to synchronize a Ruby thread with a native one.

blackhedd · May 22, 2006, 8:46am

Quoting [email protected], on Sun, May 21, 2006 at 09:45:38PM
+0900:

As far as threads are concerned: I’m one of those people who believe
that threads are seriously overused and should be avoided, especially
in high-performance applications. But occasionally if you’re mixing

I understand the argument in general, but since ruby’s “threads” aren’t
actually threads, does it apply here? Ruby with multiple “threads” is
really just a single process with a nice application-level way of
invoking particular code when a particular socket descriptor is ready,
and having that code have some state.

This is pretty much what any (other) single-threaded unix app hanging
off of select would do, except state would be held explicitly in some
kind of data structure. In ruby the state is held in the lexical
state/closure/stack (not sure what to call it) of the ruby thread.

Is the overhead of a ruby thread too high, for some reason? Uses too
much memory, doesn’t scale well across thousands of descriptors because
it uses select, something else…? I’m sure you are trying to avoid
ruby’s select-based, non-blocking, io-multiplexing scheme (aka
“threads”) for a good reason, I just don’t see what the reason is yet.

Ruby and native code, you may have threads in each. As long as Ruby’s
threads are green, this split will exist, and it would be nice to able
to synchronize a Ruby thread with a native one.

Interaction between ruby and any other OS threads is a well known
problem, but xx_nonblock APIs in Socket doesn’t seem like its going to
help that.

Sam

blackhedd · May 22, 2006, 2:16pm

No, I’m going for something rather different, and it has nothing to do
with
Ruby. (And therefore, the point is a threadjack so I’ll be brief.)
Threads
are too difficult to use. Even if you have a lot of experience (I’ve
been
programming posix-like threads since Solaris 2.4 and Win32 threads even
longer), concurrency within a process is darned hard to get right. One
really good reason to use threads is to capture system latencies like
disk
and network i/o, but this can generally be done with events. Another
good
reason is to reflect the structure of the problem you’re trying to
solve: if
your problem really does involve multiple, independent control flows,
then
threading the app will probably make it easier to write (but may also
make
it slower and harder to scale). With teams I manage, when it’s necessary
to
use threads, I impose strict rules on when and how to apply mutexes, and
how
to design synchronization sets. As long as my rules are followed, you
generally won’t see a deadlock, and you will rarely see severe mutex
contention. But most programmers hate following them. (Among them: NEVER
call a function under lock, not even one you wrote, not even an inline
or a
macro. Only variable reads and writes are allowed.)

Your second point: interactions between Ruby and native threads has
nothing
to with nonblocking I/O. Separate problem, it was my mistake if I left
the
implication that they are linked. I was thinking it might be possible to
teach Ruby to work with native mutexes and condvars.

blackhedd · May 22, 2006, 4:16pm

We’re threadjacking, so I’ll keep it short. The point of taking a very
restrictive view of synchronization is to prevent incorrect concurrency.
Your example depends on the implementation of aMap and of calculateValue
not
to do evil things. (One of the evil things they can do is simply to run
for
several milliseconds, or make a blocking I/O call. This can give you a
bad
case of mutex contention, which is exceptionally costly in many modern
implementations.) This means that the program may change behavior with
respect to concurrency across platforms, hardware, and also across time
(as
the code inside those called functions changes). Ruby adds the further
dimension that the code you call under lock may have been metaprogrammed
on
the fly.

The nightmare scenario is this: the client calls to say that your
mission-critical application stops running occasionally. It will be fine
for
a month, and then it will stop twice in one week. You ask what did they
do
differently, and the answer is always “nothing.” You ask what your
programmers changed, and the answer is always “nothing.” The problem is
of
course completely non-reproducible. This is not a nice place to be,
since
you can’t just blame the client’s environment.

I suppose your answer to all this is: just code more carefully, and only
use
well-debugged libraries. That of course is a partially-correct answer,
but
achievable in practice only at some specific cost. My larger point is
that
in the case of threads, this balance-point is often very hard to achieve
at
reasonable cost.

I’ll let you have the last word, both because we’re offtopic, and
because
threading is a religious issue to many people and so the question tends
to
generate more heat than light :-). In my defense, I’ll only say that my
dislike of threads is rooted in many years of experience, and not a mere
prejudice.

blackhedd · May 22, 2006, 3:45pm

2006/5/22, Francis C. [email protected]:

With teams I manage, when it’s necessary to
use threads, I impose strict rules on when and how to apply mutexes, and how
to design synchronization sets. As long as my rules are followed, you
generally won’t see a deadlock, and you will rarely see severe mutex
contention. But most programmers hate following them. (Among them: NEVER
call a function under lock, not even one you wrote, not even an inline or a
macro. Only variable reads and writes are allowed.)

I can see why they hate sticking to that rule. Basically you disallow
decent synchronization of functional parts of the application. The
consequence of this is that you either do not have concurrent programs
that are correct or you force people to implement their own mutex on
top of your rule. To give an example what I mean, your rule prohibits
this typical cache idiom:

// pseudo code
synchronized ( lock ) {
if ( ! aMap.contains( myKey ) ) {
// cache miss
aMap.put( myKey, calculateValue( key ) );
}
}

IMHO your ruly makes multi threaded applications pretty much pointless.

Kind regards

robert

blackhedd · May 22, 2006, 8:21pm

Francis C. wrote:

We’re threadjacking, so I’ll keep it short. The point of taking a very

I like this thread, but I’ll nudge it in the ruby direction…

the fly.
If you’re talking about ruby threads, sync mechanisms are expensive with
or without contention, since they are built on top of Thread.critical.

require ‘thread’
require ‘benchmark’

N = 1_000_000

Benchmark.bmbm(12) do |bm|

bm.report(“no Thread.critical”) do
x = 0
N.times do
x += 1
end
end

bm.report(“Thread.critical”) do
x = 0
N.times do
Thread.critical = true
x += 1
Thread.critical = false
end
end

bm.report(“Thread.exclusive”) do
x = 0
N.times do
Thread.exclusive do
x += 1
end
end
end

end

END

Rehearsal ------------------------------------------------------
no Thread.critical 1.040000 0.010000 1.050000 ( 1.063126)
Thread.critical 1.130000 0.000000 1.130000 ( 1.153935)
Thread.exclusive 2.660000 0.000000 2.660000 ( 2.704054)
--------------------------------------------- total: 4.840000sec

                     user     system      total        real

no Thread.critical 0.360000 0.000000 0.360000 ( 0.366529)
Thread.critical 0.910000 0.010000 0.920000 ( 0.922671)
Thread.exclusive 2.670000 0.000000 2.670000 ( 2.692268)