TCP Socket: Connection reset, but Select says it's valid

I’m running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I’m
calling IO.Select to verify that there are characters to be read, but
occasionally the “rescue” below is activated when I do my gets. I
presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

The client and server are in the same machine, so there’s no cable to
pull out.

The messages of interest are sent every 30 seconds, and there might be 1
or 2 failures in a day. Running on Mac OS X 10.5.6.

The problem seems to occur when the client (below) connects to send a
message and another client connects before the first client’s
transaction is complete. This transaction takes about 1 second. The
specified timeout is 120 seconds.

Is there anything obvious wrong with my code? I omitted quite a few
lines, I hope not too much.

Thanks-
Scott

######################### Server (client follows)
#########################

require ‘gserver’

class ChatServer < GServer # Server class derived from GServer super
class
def initialize(*args)
super(*args)
# Keep a record of the client IDs allocated
# and the lines of chat
@@client_id = 0
@@chat = []
end

def serve(io) # Serve method handles connections
# Increment the client ID so each client gets a unique ID
@@client_id += 1
my_client_id = @@client_id
my_position = @@chat.size

loop do
  #Every n seconds check for data
  n = 0.5

  selection = IO.select([io], nil, [io], n)
  # If some event occurred, retrieve the data and process it...
  if selection && selection[0] then
    # There was a read event
    begin
      line = io.gets
    rescue Exception => e
      stat = ''
      puts "\nerror reading 'line' from #{io}"
      puts "#{ e } (#{ e.class })!"
      # Close socket
      io.close
      print("\nSelection is #{selection[0]}\n")
    end
    if selection[2].size > 0 then
      puts "%%%%%%%%  Select error array is #{selection[2]}" +

Time.now.to_s
end
.
.
.
.
end

Use port 50000 if none supplied

portnum = ARGV[0] || 50000
max_connections = 100000
server = ChatServer.new(portnum, $my_ip, max_connections, $stdout, true)

server.start # Start the server

server.join

######################### client #########################

#!/usr/bin/env ruby

require “socket”
require ‘timeout’

TCP client. This script sends a message to the server and optionally

gets a message back.

If the message contains “check”, server relays msg to other client,

which then

will send back “ok” if it is able to.

Added ‘quit’ command when finished, so that the server disconnects

and kills the thread we were on. This avoids max connections problem.

if ARGV.size < 3 then
wait_time = 10
else
wait_time = ARGV[2].to_i
end

DEFAULT_PORT = 50000

my_ip =
Socket::getaddrinfo(Socket.gethostname,“echo”,Socket::AF_INET)[0][3]

if ARGV.size > 2
PORT = ARGV[0]
message = ARGV[1]
else
PORT = DEFAULT_PORT
message = “Empty message”
puts ‘usage: ruby IX_chat_client.rb PORTNUM “message” timeout_sec’
exit
end

chat_client = TCPSocket.new(my_ip, PORT)

sleep 1

Send message to server

chat_client.puts message

Don’t take longer than wait_time seconds to get a response

begin
Timeout::timeout(wait_time) do
reply_line = chat_client.gets
puts reply_line
end
rescue Timeout::Error
puts “Timed out.”
chat_client.puts “quit”
exit
else
chat_client.puts “quit”
exit
ensure
chat_client.close
end

Scott Cole wrote:

I’m running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I’m
calling IO.Select to verify that there are characters to be read, but
occasionally the “rescue” below is activated when I do my gets.

Your code prints the exception class and message, what do you see?
I

presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

select returns true for an EOF-condition too. From man 2 select:

   Three  independent  sets of file descriptors are watched.  Those 

listed
in readfds will be watched to see if characters become
available for
reading (more precisely, to see if a read will not block; in
particu‐
lar, a file descriptor is also ready on end-of-file)

Thanks for responding. I would expect to get some kind of EOF error on
an EOF condition. Instead I see something like

error reading ‘line’ from #TCPSocket:0x40e3f4
Connection reset by peer (Errno::ECONNRESET)!

Selection is #TCPSocket:0x40e3f4

Brian C. wrote:

Scott Cole wrote:

I’m running a server that sometimes encounters a
connection-reset-by-peer error when reading data from a client. I’m
calling IO.Select to verify that there are characters to be read, but
occasionally the “rescue” below is activated when I do my gets.

Your code prints the exception class and message, what do you see?
I

presume this means the socket was closed after Select noticed there were
characters to be read but before I actually try to read them.

select returns true for an EOF-condition too. From man 2 select:

   Three  independent  sets of file descriptors are watched.  Those 

listed
in readfds will be watched to see if characters become
available for
reading (more precisely, to see if a read will not block; in
particu‐
lar, a file descriptor is also ready on end-of-file)

Scott Cole wrote:

Thanks for responding. I would expect to get some kind of EOF error on
an EOF condition. Instead I see something like

error reading ‘line’ from #TCPSocket:0x40e3f4
Connection reset by peer (Errno::ECONNRESET)!

Yep, it’s not a simple EOF - the far end has sent a TCP RST instead of a
FIN - but AFAIK the socket is marked ‘readable’ for select.

More info at:
http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2007-08/msg00176.html

Note also that this is does not mean that the socket is marked
selectable for ‘error’ - selection[2]. In fact, you’re almost certainly
never going to find the socket marked that way. See
http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2007-11/msg00187.html