TCPSocket recv's nil with gets() if the server crashes?

luislavena · September 7, 2011, 8:19am

I don’t claim to be much of a programmer, and I am also new to RUBY, so
hopefully this is a good question.

I am making a client that can connect to a port and read an unknown
number of lines, like a server banner.
Since the server doesn’t close the connection, the program will not move
on. To overcome this, I used select() with a timeout, so it will read
whatever is in the pipe and move on if it gets nothing more within n
seconds.

Everything worked fine, until I tried crashing the server. I started to
get a string of nil coming into the client.

Here is the code for the server:

#!/usr/bin/ruby

require ‘socket’ # Get sockets from stdlib

server = TCPServer.open(2000) # Socket to listen on port 2000
loop { # Servers run forever
client = server.accept # Wait for a client to connect
client.puts(Time.now.ctime) # Send the time to the client
#Don’t close the connection, to simulate an unknown size.
}

And my client:

#!/usr/bin/ruby

require ‘socket’

host = ‘localhost’
port = 2000
sockets = Array.new #select() requires an array
#fill the first index with a socket
sockets[0] = TCPSocket.open(host, port)
while 1 #loop till it breaks
#listen for a read, timeout 10 seconds
res = select(sockets, nil, nil, 10)
if res != nil #a nil is a timeout and will break
#THIS PRINTS NIL FOREVER on a server crash
puts sockets[0].gets()
else
sockets[0].close
break
end
end

If the server stays up, the client gets the time, waits 10 seconds for
more, and just quits and closes.
If you kill the server before select() times out, you will get an
infinite loop of ‘nil’ in the client.
I used socket[0] instead of a loop to parse read events in res since
there is only one socket, and it must be the one triggering select. (I
checked to be sure, res[0] and socket[0] are the same socket)

I could use this ‘nil’ to show that the server has crashed, but I can’t
help but think it is a symptom of something I’ve done terribly wrong
that works for now but will bite me in the future.

Any thoughts on where this ‘nil’ is from?

wpwood · September 7, 2011, 9:02am

“Bill W.” [email protected] wrote:

I don’t claim to be much of a programmer, and I am also new to RUBY, so
hopefully this is a good question.

I’ll try to explain things with that in mind (but feel free to ask for
clarification)

I am making a client that can connect to a port and read an unknown
number of lines, like a server banner.
Since the server doesn’t close the connection, the program will not move
on. To overcome this, I used select() with a timeout, so it will read
whatever is in the pipe and move on if it gets nothing more within n
seconds.

Everything worked fine, until I tried crashing the server. I started to
get a string of nil coming into the client.

Any method of crashing/terminating the server process (but not the
operating system) causes the operating system/networking stack to close
the TCP connection[1]. This shutdown should be the same as if the
server code explicitly closed it.

If you kill the server before select() times out, you will get an
infinite loop of ‘nil’ in the client.

I could use this ‘nil’ to show that the server has crashed, but I can’t
help but think it is a symptom of something I’ve done terribly wrong
that works for now but will bite me in the future.

It’ll work in the future. Break out of the loop if you get nil and
close the socket in the client, too.

Any thoughts on where this ‘nil’ is from?

The TCPSocket#gets method is actually IO#gets (an inherited method from
IO, as TCPSocket is a subclass of IO).

If you read the documentation (ri IO#gets), you’ll see IO#gets
returns nil on the end-of-file condition, indicating there’s nothing
left to be done and the IO object should be closed.

Keep in mind you won’t be reliably notified in many failure cases (e.g.
the entire server crashing (OS/hardware/power failure), or a link-level
failure (cables cut, switch fails, …), so you’ll still have to rely
on select() to timeout those cases.

[1] If the crashed process is the only process with that connection,
but that appears to be the case for you.

wpwood · September 7, 2011, 9:24am

Bill W. wrote in post #1020562:

I don’t claim to be much of a programmer, and I am also new to RUBY, so
hopefully this is a good question.

I am making a client that can connect to a port and read an unknown
number of lines, like a server banner.
Since the server doesn’t close the connection, the program will not move
on. To overcome this, I used select() with a timeout, so it will read
whatever is in the pipe and move on if it gets nothing more within n
seconds.

Everything worked fine, until I tried crashing the server. I started to
get a string of nil coming into the client.

Here is the code for the server:

#!/usr/bin/ruby

require ‘socket’ # Get sockets from stdlib

server = TCPServer.open(2000) # Socket to listen on port 2000
loop { # Servers run forever
client = server.accept # Wait for a client to connect
client.puts(Time.now.ctime) # Send the time to the client
#Don’t close the connection, to simulate an unknown size.
}

And my client:

#!/usr/bin/ruby

require ‘socket’

host = ‘localhost’
port = 2000
sockets = Array.new #select() requires an array
#fill the first index with a socket
sockets[0] = TCPSocket.open(host, port)
while 1 #loop till it breaks
#listen for a read, timeout 10 seconds
res = select(sockets, nil, nil, 10)
if res != nil #a nil is a timeout and will break

Situation: the server closes the socket.
select() returns nil if it times out–otherwise it returns a non-nil.
Presumably, there is no timeout, so execution continues inside your
if-block.

    #THIS PRINTS NIL FOREVER on a server crash
puts sockets[0].gets()

If you check the docs for IO.gets(), it returns nil at end of file.
When a socket is closed, it sends eof to the other side. Therefore
gets() returns nil. Then the loop begins again. Nothing has changed,
so you get the same output, etc., etc.

else
sockets[0].close
break
end
end

If the server stays up, the client gets the time, waits 10 seconds for
more, and just quits and closes.
If you kill the server before select() times out, you will get an
infinite loop of ‘nil’ in the client.
I used socket[0] instead of a loop to parse read events in res since
there is only one socket, and it must be the one triggering select. (I
checked to be sure, res[0] and socket[0] are the same socket)

I could use this ‘nil’ to show that the server has crashed, but I can’t
help but think it is a symptom of something I’ve done terribly wrong
that works for now but will bite me in the future.

Any thoughts on where this ‘nil’ is from?

wpwood · September 7, 2011, 6:46pm

Thanks for the answers! I guess I needed to RTFM a little more, the
‘nil’ is a normal part of IO#gets to return ‘nil’ if nothing is there.

using gets() from a TCPSocket seems to be a common question, since
everyone seems to have the problem where it hangs if the socket isn’t
closed by the server. (I do find this odd, is it actually expected you
know every time how many lines the server will send back?)

Is there a more efficient or accepted way to read an unknown amount of
data from a socket than the way I am doing it?

I’ve seen it done with ‘timeout’ and catches the error, but select()
includes a timeout.

I’ve seen things about sync and flush on the client side, but since I
have not once seen any comment from the asker that it actually worked,
I’ve never checked into it.

I am pretty much learning RUBY on the fly, working towards a final goal,
so I am hoping that I haven’t missed anything else obvious here.

wpwood · September 7, 2011, 9:28am

Extending a bit on Eric’s excellent reply:

On Wed, Sep 7, 2011 at 8:19 AM, Bill W. [email protected] wrote:

Everything worked fine, until I tried crashing the server. I started to
client = server.accept # Wait for a client to connect
client.puts(Time.now.ctime) # Send the time to the client
#Don’t close the connection, to simulate an unknown size.

If you do not switch the socket to sync=true or invoke #flush here
chances are that the content is still sitting in the internal buffer
(Ruby does buffered IO) when you kill the process. I suggest to add
client.flush because that has the advantage of still buffering and
explicit delimiting “messages” (which might consist of multiple
lines). In other words you make the sending explicit. This is
especially necessary in situations where there are pauses between the
sending of different messages and you want the client to process
messages rather sooner than later. You could try with

require ‘socket’ # Get sockets from stdlib

server = TCPServer.open(2000) # Socket to listen on port 2000
loop { # Servers run forever
client = server.accept # Wait for a client to connect
client.puts(Time.now.ctime) # Send the time to the client

uncomment to see the difference:

client.flush

sleep 10
client.puts(Time.now.ctime) # Send the time to the client

uncomment to see the difference:

client.flush

#Don’t close the connection, to simulate an unknown size.
}

sockets = Array.new #select() requires an array
break

I could use this ‘nil’ to show that the server has crashed, but I can’t
help but think it is a symptom of something I’ve done terribly wrong
that works for now but will bite me in the future.

Any thoughts on where this ‘nil’ is from?

See Eric’s reply.

Kind regards

robert

wpwood · September 7, 2011, 7:52pm

“Bill W.” [email protected] wrote:

using gets() from a TCPSocket seems to be a common question, since
everyone seems to have the problem where it hangs if the socket isn’t
closed by the server. (I do find this odd, is it actually expected you
know every time how many lines the server will send back?)

It depends on the protocol.

Is there a more efficient or accepted way to read an unknown amount of
data from a socket than the way I am doing it?

If it’s something important, then use IO#readpartial and provide
your own buffering/parsing. If a robust library already exists for
whatever you’re reading, use that.

The problem with IO#gets is that it can still hang indefinitely if the
server sent an incomplete line. select() will wake up from one byte
of data, but the server could’ve had its connection truncated before
the rest of the line reached you.

The Net::* libraries in the Ruby standard library implement their
own line-buffering for this reason.

I’ve seen it done with ‘timeout’ and catches the error, but select()
includes a timeout.

timeout will protect you from the incomplete line case when using (much
of) the Ruby standard library. It’s unreliable with some 3rd-party
extensions and a bit hackish, though.

I’ve seen things about sync and flush on the client side, but since I
have not once seen any comment from the asker that it actually worked,
I’ve never checked into it.

Ruby (at least MRI) sockets default to IO#sync=true so in many cases
you don’t need to flush.

wpwood · September 7, 2011, 9:35pm

On Sep 7, 2011, at 1:50 PM, Eric W. wrote:

The problem with IO#gets is that it can still hang indefinitely if the
server sent an incomplete line. select() will wake up from one byte
of data, but the server could’ve had its connection truncated before
the rest of the line reached you.

In additional to worrying about a server failing, you must always worry
about the network silently failing. It may be that the server is up,
has transmitted a full line of text but that the line straddled two
network segments with the first segment making it successfully to the
client but the second segment being lost in the network. Subsequent
retransmissions might fail for a long time before succeeding and so the
client is left with a partial line of text for an indefinite amount of
time (thus the need to incorporate some sort of application level
timeout).

Gary W.