Does IO.read block?

mike343 · March 15, 2009, 11:05pm

Hi All,

I have a setup where I am writing to a pipe in one process and reading
in another. I am closing the end I’m not using etc, but I have a new
problem. Having recently read that I was just getting lucky when it
came to my IO.write completing each time (probably due to Ruby’s green
thread implementation in v1.8 and ‘nice’ scheduling) but now that the OS
has taken over scheduling, it tends to interrupt things when it damn
well feels like it, so I believe some of my write calls aren’t
completing, so I’ve set up a loop like this:

rescue Exception => error
ex_string = Marshal.dump(error)
ex_size = ex_string.bytesize
bytes_written = 0
while bytes_written < ex_size
bytes_written += write_end.write(ex_string.slice!(bytes_written))
end
ensure
write_end.close
end
end

and in the other process, I currently make just one call to
read_end.read() so my question is, is this guaranteed to get all of the
bytes written through multiple calls to write or do I need to set up a
similar loop on the read end? like

while !(read_end.eof?)
string += read_end.read
end

Or does read_end.read block until it finds EOF?

Can anyone tell me what happens when read_end.read is re-scheduled
partway through? Or is that guaranteed to finish?
I’m running this in Linux, so posix rules probably apply.

Thanks in advance,
Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 15, 2009, 11:24pm

On 15.03.2009 23:02, Michael M. wrote:

ex_string = Marshal.dump(error)
ex_size = ex_string.bytesize
bytes_written = 0
while bytes_written < ex_size
bytes_written += write_end.write(ex_string.slice!(bytes_written))
end
ensure
write_end.close

You do not seem to open the stream in this context, why do you close it
here? Or do you have a pattern like

write_end = … # open
begin
…
rescue
…
ensure
write_end.close
end

end
end

IMHO you can simplify that do

rescue Exception => error
Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.

and in the other process, I currently make just one call to
read_end.read() so my question is, is this guaranteed to get all of the
bytes written through multiple calls to write or do I need to set up a
similar loop on the read end? like

while !(read_end.eof?)
string += read_end.read
end

Or does read_end.read block until it finds EOF?

Yes, that’s what it does:
http://www.ruby-doc.org/core/classes/IO.html#M002295

But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional
buffer and looping is done in C code.

Can anyone tell me what happens when read_end.read is re-scheduled
partway through? Or is that guaranteed to finish?

I am not sure what you mean by “rescheduled”. Even if the OS preempts
execution of this process or thread it will continue at the same
location of the stream and semantics of the methods do not change.

Kind regards

robert

mike343 · March 16, 2009, 12:20am

Robert K. wrote:

You do not seem to open the stream in this context, why do you close
it here? Or do you have a pattern like
that code is inside the block passed to fork(), so the write_end needs
to be closed so the parent process reads an EOF. The pipe is opened
prior to the fork() call, so the fd should be copied to the child
process, thus the parent process would block indefinitely waiting for
the EOF.
IMHO you can simplify that do

rescue Exception => error
Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.
Thanks, I’ve applied that.
But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional
buffer and looping is done in C code.
I need the additional buffer because an exception is not always thrown
so attempting to unmarshal an empty string causes problems
I am not sure what you mean by “rescheduled”. Even if the OS preempts
execution of this process or thread it will continue at the same
location of the stream and semantics of the methods do not change.
Yes, I meant preempted.

Kind regards
robert

My problem still exists. I didn’t get into the details of the
specifics because I haven’t proved where it exists. I just got excited
when I saw that as a potential problem area and figured it needed fixing
because it would bite me sooner or later.

Back to the drawing board until I have a better question…

Thanks,
Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 16, 2009, 12:38am

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional
buffer and looping is done in C code.
I need the additional buffer because an exception is not always thrown
so attempting to unmarshal an empty string causes problems

Wait, I just re-read your email. The looping is done in C. So that
implies I do need to loop on the read_end.read() call? Efficiency is
not a massive concern for me - correctness is.

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 16, 2009, 8:40am

On 16.03.2009 00:17, Michael M. wrote:

You do not seem to open the stream in this context, why do you close
it here? Or do you have a pattern like
that code is inside the block passed to fork(), so the write_end needs
to be closed so the parent process reads an EOF. The pipe is opened
prior to the fork() call, so the fd should be copied to the child
process, thus the parent process would block indefinitely waiting for
the EOF.

Ah, I see.

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional
buffer and looping is done in C code.
I need the additional buffer because an exception is not always thrown
so attempting to unmarshal an empty string causes problems

Well, you could just serialize nil or the result of the computation in
the OK case. That simplifies handling in the calling process plus you
can get the result as a Ruby object. You then only need to test whether
what you got is an instance of Exception or not.

My problem still exists. I didn’t get into the details of the
specifics because I haven’t proved where it exists. I just got excited
when I saw that as a potential problem area and figured it needed fixing
because it would bite me sooner or later.

One problem that could hit you is that you might not get the concurrency
right: you need to make sure that stderr and stdout are read
concurrently (if there can be output on both) because if the pipe on one
of them fills up your child process is blocked. Maybe your issue lies
in that area.

Back to the drawing board until I have a better question…

Wait, I just re-read your email. The looping is done in C. So that
implies I do need to loop on the read_end.read() call? Efficiency is
not a massive concern for me - correctness is.

No, you do not need to loop for IO#read since the docs say it will read
all the way to the end of the stream when leaving out the length
argument.

Kind regards

robert

mike343 · March 16, 2009, 9:25pm

On 16.03.2009 21:00, Michael M. wrote:

One problem that could hit you is that you might not get the
concurrency right: you need to make sure that stderr and stdout are
read concurrently (if there can be output on both) because if the pipe
on one of them fills up your child process is blocked. Maybe your
issue lies in that area.

I have set STDOUT.sync = true Is that enough?

No, definitively not! This covers the sending side but the locking
issue I described is caused by the receiving side not reading all the
data. Assume an application as a child process that frequently writes
to stdout and stderr. The parent process at the other end of those two
(!) pipes only reads the stdout pipe. Now what happens is this: once
the pipe’s buffer (OS and configuration dependent, often 4k) fills up
the write call blocks and the child hangs - indefinitely.

It should be, because I
don’t have any output originating from the child processes or any child
threads, only the parent/calling thread. But I’ve been wrong before…

Though I think I have narrowed the problem very slightly. The
thread/process pair that blocks indefinitely blocks on the
read_end.read() call. Which is weird and annoying.

And you are sure you close stdout in the client or the client
terminates?

My next task will
be to set up some logging to figure out if there should have been an
exception to chuck down the pipe. (In my test case I have a random
amount of jobs, in which every second one raises an exception)

What I usually do in these situations that can be a bit tricky: I start
out writing a simple test scenario that I modify step by step until it
resembles the problem I want to solve and does not exhibit abnormal
behavior.

Thanks a lot for your help! It’s certainly possible that this is going
to be my “Stupidly Difficult Bug When I Was Near The Beginning Of My
First Programming Job” that everyone seems to have.

LOL

robert

mike343 · March 16, 2009, 10:15pm

Robert K. wrote:

On 15.03.2009 23:02, Michael M. wrote:

Or does read_end.read block until it finds EOF?

Yes, that’s what it does:
class IO - RDoc Documentation

The docs shouldn’t be read as a Bible. The docs may not be very precise.
If IO#read is similar to C’s read(), then it also returns when it had
read all the available bytes. See “man 2 read”.

mike343 · March 16, 2009, 9:03pm

One problem that could hit you is that you might not get the
concurrency right: you need to make sure that stderr and stdout are
read concurrently (if there can be output on both) because if the pipe
on one of them fills up your child process is blocked. Maybe your
issue lies in that area.

I have set STDOUT.sync = true Is that enough? It should be, because I
don’t have any output originating from the child processes or any child
threads, only the parent/calling thread. But I’ve been wrong before…

Though I think I have narrowed the problem very slightly. The
thread/process pair that blocks indefinitely blocks on the
read_end.read() call. Which is weird and annoying. My next task will
be to set up some logging to figure out if there should have been an
exception to chuck down the pipe. (In my test case I have a random
amount of jobs, in which every second one raises an exception)

Thanks a lot for your help! It’s certainly possible that this is going
to be my “Stupidly Difficult Bug When I Was Near The Beginning Of My
First Programming Job” that everyone seems to have.

Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 17, 2009, 12:06am

Robert K. wrote:

issue I described is caused by the receiving side not reading all the

Though I think I have narrowed the problem very slightly. The
thread/process pair that blocks indefinitely blocks on the
read_end.read() call. Which is weird and annoying.

And you are sure you close stdout in the client or the client terminates?
pw = IO::pipe
pr = IO::pipe
pe = IO::pipe
read_end,write_end = IO.pipe
pid = fork() do
begin
pw[1].close
STDIN.reopen(pw[0])
pw[0].close

          pr[0].close
          STDOUT.reopen(pr[1])
          pr[1].close

          pe[0].close
          STDERR.reopen(pe[1])
          pe[1].close

That’s the code I used to make sure STDOUT, STDIN and STDERR weren’t
causing full buffer hangs. I borrowed it from a process detach method
elsewhere, so it might not be exactly what I want. However, it had no
effect. The problem still exists.

Righto, here’s the deal: Either the fork call doesn’t work (but doesn’t
fail either -> a valid pid is returned)
if I do a puts either side of the fork() (with and without the STDOUT
re-direction) then on the process that blocks, the puts just inside the
fork() doesn’t run. This indicates that the fork() call fails, however
it doesn’t return nil like the docs say.

An interesting effect I noticed, is if I remove my call to
Process.wait(pid) then it displays almost identical behaviour! It runs
perfectly some times, and occasionally just hangs. However, if I’m not
collecting child processes, then I would expect to hit my ulimit at the
same place every time and for the fork() call to fail noisily. The docs
don’t mention anything on this. Is it possible that my child processes
aren’t being collected correctly, thus the rlimit is being hit, but Ruby
believes the process is collected, so its internal rlimit is not
reached, so it calls fork() regardless? Is that at all sane? Though
presumably, the external fork() call would fail, causing the internal
call to fail. And yes, I’m speaking as if the two are different, but
they might not be. I suppose it doesn’t really make sense to duplicate
it. In fact, the call directly after the block passed to fork() doesn’t
execute. Is it possible that my Thread is hanging on the fork() call?
What would that indicate? How could I fix it?

No straw will be left un-grasped at!

Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 17, 2009, 7:41pm

On 16.03.2009 22:12, Albert S. wrote:

Robert K. wrote:

On 15.03.2009 23:02, Michael M. wrote:

Or does read_end.read block until it finds EOF?
Yes, that’s what it does:
class IO - RDoc Documentation

The docs shouldn’t be read as a Bible. The docs may not be very precise.
If IO#read is similar to C’s read(), then it also returns when it had
read all the available bytes. See “man 2 read”.

In this particular case the docs are correct to my knowledge. In other
words IO#read will block until the full stream is read (or an error
condition occurs of course).

Cheers

robert

mike343 · March 17, 2009, 9:12pm

Open3.popen “cmd args” do |in,out,err|
out.puts “foo”
end

I’m not running a command as such, I’m running a block of code. Also, I
have come to the conclusion that I actually don’t care about the output
in the child process. Is there an easy way to redirect to /dev/null?

Could just mean that you create new processes faster than old ones are
closed.
This is almost certainly true. I didn’t think of that. And I finally
worked out how to use ulimit, realising the processes are per user and
in the realm of 26000. You’re right, it probably isn’t that.

I’d rather ask: what is the problem you are trying to solve? I still
lack the big picture - we’re talking about details here all the time
but it’s difficult to stay on track when the context is missing.

Sorry, I was trying to ask specific questions rather than asking the
good folks on ruby-talk to debug my script for me entirely. I am
writing a thread pool, which accepts jobs as block parameters and by
calling run_process(&block) the job is run in an entirely new process,
with the thread from the pool calling fork(), creating a pipe to
transfer any exception objects back to the parent (hence the 4th pipe)
and then waiting to collect the child thread’s status. The main use of
this is for compiling concurrently with the thread pool, then linking
concurrently with the process option.

Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 17, 2009, 10:30pm

On 17.03.2009 21:09, Michael M. wrote:

end

I’m not running a command as such, I’m running a block of code. Also, I
have come to the conclusion that I actually don’t care about the output
in the child process. Is there an easy way to redirect to /dev/null?

Please take a look at http://pastie.org/419160 for an answer to that as
well as experimental script.

and then waiting to collect the child thread’s status. The main use of
this is for compiling concurrently with the thread pool, then linking
concurrently with the process option.

But in that case you only need a single pipe because all you want to
transport back is the result of calculation, correct? And you could
leave stderr untouched to get any error messages that you do not expect
to the console.

Another alternative would be to use DRb for the communication.

Cheers

robert

mike343 · March 17, 2009, 7:40pm

On 17.03.2009 00:03, Michael M. wrote:

issue I described is caused by the receiving side not reading all the

Though I think I have narrowed the problem very slightly. The
thread/process pair that blocks indefinitely blocks on the
read_end.read() call. Which is weird and annoying.
And you are sure you close stdout in the client or the client terminates?
pw = IO::pipe
pr = IO::pipe
pe = IO::pipe
read_end,write_end = IO.pipe

Why do you open four pipes when three are sufficient?

          pe[0].close
          STDERR.reopen(pe[1])
          pe[1].close
That’s the code I used to make sure STDOUT, STDIN and STDERR weren’t
causing full buffer hangs.

This code by no means does something about buffer hangs. You just make
sure that stdin, stdout and stderr are redirected to pipes.

Btw, do you also close other ends in the parent?

I borrowed it from a process detach method
elsewhere, so it might not be exactly what I want. However, it had no
effect. The problem still exists.

I do not know whether you plan to exec your forked process but if so you
can make your life much easier by using Open3. Then you can do

from memory

require ‘open3’
Open3.popen “cmd args” do |in,out,err|
out.puts “foo”
end

Righto, here’s the deal: Either the fork call doesn’t work (but doesn’t
fail either -> a valid pid is returned)
if I do a puts either side of the fork() (with and without the STDOUT
re-direction) then on the process that blocks, the puts just inside the
fork() doesn’t run. This indicates that the fork() call fails, however
it doesn’t return nil like the docs say.

I am not sure I can follow your logic. Failure of fork can be
recognized by an exception (in the unlikely case that your OS cannot
manage more processes or won’t allow you to have another one).

I have to agree that your reasoning is not fully clear to me. Using
output operations to determine failure or success of a fork call seems
to be using the wrong tool for the job.

An interesting effect I noticed, is if I remove my call to
Process.wait(pid) then it displays almost identical behaviour! It runs
perfectly some times, and occasionally just hangs. However, if I’m not
collecting child processes, then I would expect to hit my ulimit at the
same place every time and for the fork() call to fail noisily. The docs
don’t mention anything on this. Is it possible that my child processes
aren’t being collected correctly, thus the rlimit is being hit, but Ruby
believes the process is collected, so its internal rlimit is not
reached, so it calls fork() regardless?

Could just mean that you create new processes faster than old ones are
closed.

Is that at all sane? Though
presumably, the external fork() call would fail, causing the internal
call to fail. And yes, I’m speaking as if the two are different, but
they might not be. I suppose it doesn’t really make sense to duplicate
it. In fact, the call directly after the block passed to fork() doesn’t
execute. Is it possible that my Thread is hanging on the fork() call?
What would that indicate? How could I fix it?

I’d rather ask: what is the problem you are trying to solve? I still
lack the big picture - we’re talking about details here all the time but
it’s difficult to stay on track when the context is missing.

Cheers

robert

mike343 · March 18, 2009, 9:45am

Here’s another pastie that demonstrates a common error causing a
deadlock:

http://pastie.org/419533

Cheers

robert

mike343 · March 18, 2009, 8:03pm

Robert K. wrote:

Here’s another pastie that demonstrates a common error causing a deadlock:

http://pastie.org/419533

Cheers

robert

Thanks, but I have a new theory. I broke everything down to be as
atomic as possible, which included pulling in the Synchronised Queue
from thread.rb and modifying it slightly to lock the mutex then unlock
when necessary, so blocks were being passed as parameters as little as
possible. (As they are quite removed from being an atomic operation).
And very nearly immediately, ruby’s deadlock protection kicked in. What
happens is this:

In the main thread, I create the jobs and put them on the queue.
Depending on how things run (why it only happens sometimes) if the
worker threads are scheduled more favourably, then they will complete,
but not die (as they are blocking on queue, waiting for new jobs)
meaning that each thread, except for the main thread is asleep. I then
call join() in the main thread meaning the main thread sleeps before the
others have a chance to wakeup and finish, at which point ruby detects a
deadlock, even though its next operation should be to wakeup a new
thread. At this point a “Bug Report: Unlocking mutex must be NULL” is
generated by the interpreter and then hastily exits.

As soon as I have a smaller, specific case I will be filing a bug
report.
Thanks for all your help, but I think I have it now.

Michael

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

mike343 · March 18, 2009, 8:25pm

On 18.03.2009 19:59, Michael M. wrote:

As soon as I have a smaller, specific case I will be filing a bug report.

But make sure that you file the bug report against the right piece of
software. Could well be an error in your code.

Thanks for all your help, but I think I have it now.

You’re welcome!

Cheers

robert