Does Ruby's FTP lib have a ghostly bug?

robertjames · January 23, 2007, 8:56pm

Sometimes when using Ruby’s FTP lib, I get this weird error:
#<EOFError: end of file reached>

I get this both on Linux and Windows. Googling around shows that other
people have reported it, although no one seems to know what causes it
or how to solve it (other than trying to reboot).

http://jonaquino.blogspot.com/2005/03/ruby-ftp-error.html
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/101791
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/86343

Looking at the source, my only guess is that it’s caused if the client
tries to read before the remote host has sent the data. (But I don’t
know enough about Ruby socket programming for this to be more than a
guess.)

net/ftp.rb:210

def getline
line = @sock.readline # if get EOF, raise EOFError
line.sub!(/(\r\n|\n|\r)\z/n, “”)

Anyone capable of solving this bug, or at least have a workaround?

(Here is a stack trace:
#<EOFError: end of file reached>
“/ruby/lib/ruby/1.8/net/ftp.rb:211:in readline'" "/ruby/lib/ruby/1.8/net/ftp.rb:211:in getline’”
“/ruby/lib/ruby/1.8/net/ftp.rb:221:in getmultiline'" "/ruby/lib/ruby/1.8/net/ftp.rb:235:in getresp’”
“/ruby/lib/ruby/1.8/net/ftp.rb:251:in voidresp'" "/ruby/lib/ruby/1.8/net/ftp.rb:176:in connect’”
“/ruby/lib/ruby/1.8/monitor.rb:229:in synchronize'" "/ruby/lib/ruby/1.8/net/ftp.rb:174:in connect’”
“/ruby/lib/ruby/1.8/net/ftp.rb:136:in initialize'" "/ruby/lib/ruby/1.8/net/ftp.rb:120:in open’”
“my_code/connection.rb:27:in `ftp’” # Net::FTP.open(ftp_host,
username, password)

robertjames · January 23, 2007, 9:10pm

On Wed, 24 Jan 2007, S. Robert J. wrote:

Anyone capable of solving this bug, or at least have a workaround?
“/ruby/lib/ruby/1.8/net/ftp.rb:251:in voidresp'" "/ruby/lib/ruby/1.8/net/ftp.rb:176:inconnect’”
“/ruby/lib/ruby/1.8/monitor.rb:229:in synchronize'" "/ruby/lib/ruby/1.8/net/ftp.rb:174:inconnect’”
“/ruby/lib/ruby/1.8/net/ftp.rb:136:in initialize'" "/ruby/lib/ruby/1.8/net/ftp.rb:120:inopen’”
“my_code/connection.rb:27:in `ftp’” # Net::FTP.open(ftp_host,
username, password)

can you ftp to the host using the command line ‘ftp’ program of either
windows
or linux?

-a

robertjames · January 23, 2007, 10:06pm

On Wed, 24 Jan 2007, S. Robert J. wrote:

bummer. that would have been the easy answer ;-(

I should add that the app in question makes a large number of ftp
connections to the host, one after the other. (I don’t think it caches
them.) Perhaps that is somehow related…

can you make the connection shared and mutex access to it?

-a

robertjames · January 23, 2007, 9:21pm

[email protected] wrote:

can you ftp to the host using the command line ‘ftp’ program of either windows
or linux?

Yes! Sorry for not stating that. This is a pure ruby issue - ftp in
general works fine in all situations. Also, even with Ruby, this bug
does not happen 100% of the time. (I haven’t tested enough to know
what determines it, but I do know that even for the same (host, user,
pw) it will sometimes work and sometimes not.)

I should add that the app in question makes a large number of ftp
connections to the host, one after the other. (I don’t think it caches
them.) Perhaps that is somehow related…

robertjames · January 23, 2007, 10:08pm

On Wed, 24 Jan 2007, S. Robert J. wrote:

I should add that the app in question makes a large number of ftp
connections to the host, one after the other. (I don’t think it caches
them.) Perhaps that is somehow related…

forgot to mention, try running your script with

Net::FTP.debug_mode = true

and see if anything useful comes out… sounds like you might just be
getting
too many connections though…

-a

robertjames · January 23, 2007, 10:20pm

I just added FTP conneciton caching, which seems to have removed the
problem. Therefore, I’ll assume that this was not a ftp.rb problem,
but rather connections being refused when too many came in. (This
didn’t show up on simple command line testing, because the rate was
much slower.) I’ll add that being that the EOFError message is quite
cryptic, perhaps whoever maintains ftp.rb could make a more
comprehnsible error message (along with the actual OS level socket
error).

I also need to thank Ara. By asking me what I thought was an obvious
question (does command line ftp work), he forced me to communicate my
assumption (“command line works, cryptic error message --> ergo ftp.rb
problem”) and consequently rethink it. Is the command line doing the
same thing as the app? No, it’s making a lot fewer connections.

This has served to me as a good lesson in debugging:

Can you prove the problem is where you think it is? Expand the
scope.
With any bug, there’s something that you’re wrong about. Perhaps
those things which give you the information you are sure of are
themselves wrong. That is, don’t just look for problems where you see
them – look for problems in all of your proofs to get there.
Last, always tell the problem to someone else. Having to
communicate to someone forces you to rethink it. And if that someone
else wants proof (like most good Rubyists), it forces you to prove it.
In this case, when I glanced at the src code, I thought it did cache
the connections. But I was going to paste that line in to the list and
saw that, in this scenario, it wasn’t caching them.

Thanks Ara! And thanks comp.lang.ruby!

robertjames · January 23, 2007, 11:11pm

On Wed, 24 Jan 2007, S. Robert J. wrote:

question (does command line ftp work), he forced me to communicate my
them – look for problems in all of your proofs to get there.
3. Last, always tell the problem to someone else. Having to
communicate to someone forces you to rethink it. And if that someone
else wants proof (like most good Rubyists), it forces you to prove it.
In this case, when I glanced at the src code, I thought it did cache
the connections. But I was going to paste that line in to the list and
saw that, in this scenario, it wasn’t caching them.

Thanks Ara! And thanks comp.lang.ruby!

right on. you can thank my friend tom lauren. we used to debug
together and
he was such a bastard: i’d say something like, “here x is 42” and he
just
wouldn’t believe anything i said unless he could prove it
himself in
the debugger. the thing i learned though, was that he always found the
bug!

cheers.

-a

robertjames · January 23, 2007, 10:21pm

S. Robert J. wrote:

perhaps whoever maintains ftp.rb could make a more
comprehnsible error message (along with the actual OS level socket
error).

PS - Who do I speak to about this?

robertjames · January 24, 2007, 1:41am

It’s being called from an attempt to connect to an FTP site. So, it’s
not really an out of data per se, but a failure to connect.

Of course, the implementation may result in that failure causing no
data to be available - but it would be a lot clearer if it would a) let
me know that it didn’t connect and b) forward on the OS socket level
error.

Or am I missing something? If the connection did work, but the host
just didn’t send any data (ie, broken FTP implementation), it’s weird
that several people are reporting it - and that it doesn’t seem to
occur outside of ftp.rb. (ie with other languages).

robertjames · January 24, 2007, 1:14am

On Jan 23, 2007, at 13:20, S. Robert J. wrote:

S. Robert J. wrote:

perhaps whoever maintains ftp.rb could make a more
comprehnsible error message (along with the actual OS level socket
error).

PS - Who do I speak to about this?

I’m pretty sure its exactly an EOFError. (You have run out of data
to read.)

$ ruby -rstringio
s = StringIO.new “foo\nbar”
s.readline
s.readline
s.readline
-:4:in `readline’: end of file reached (EOFError)
from -:4

Why you’ve run out of data is more difficult to determine. "out of
data waiting for ‘X’ may be more helpful, but I’m not sure how easy
that is to determine though.

–
Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!

robertjames · January 24, 2007, 3:11am

On Jan 23, 8:41 pm, Jamey C. [email protected] wrote:

S. Robert J. wrote:

Sometimes when using Ruby’s FTP lib, I get this weird error:
#<EOFError: end of file reached>I have 50+ ftp scripts that run daily. I only get this error on one of
those scripts, which, like you observed in a later email, moves a bunch
of files every time it runs, one after another. I put a counter in the
script and I found that it consistently bombed out after moving the 60th
file, i.e. I always got this error when it was attempting to move file
number 61.

It sounds like this is a bug in ftp.rb. Can you confirm that the ftp
host doesn’t display this problem when accessed via other means (eg
another language).

robertjames · January 24, 2007, 2:42am

S. Robert J. wrote:

Sometimes when using Ruby’s FTP lib, I get this weird error:
#<EOFError: end of file reached>

I have 50+ ftp scripts that run daily. I only get this error on one of
those scripts, which, like you observed in a later email, moves a bunch
of files every time it runs, one after another. I put a counter in the
script and I found that it consistently bombed out after moving the 60th
file, i.e. I always got this error when it was attempting to move file
number 61.

HTH,

Jamey C.

robertjames · January 24, 2007, 5:23am

“S. Robert J.” [email protected] writes:

“/ruby/lib/ruby/1.8/net/ftp.rb:136:in initialize'" "/ruby/lib/ruby/1.8/net/ftp.rb:120:in open’”
“my_code/connection.rb:27:in `ftp’” # Net::FTP.open(ftp_host,
username, password)

Looking at that, and at other messages in this thread, I tried this
experiment. First, I set up a service locally that listened to port
8989 and simply echoed ten lines with a five second delay between each
echo. (with a simple shell script + inetd)

Then, in irb:

require “socket”
(1…1300).map{|s|puts s;a=TCPSocket.open(“localhost”,8989);a.readline;a}

This fails with the same EOFError reported in ftp.rb, after opening
the 256th socket. (I use “map” instead of “each” to keep from closing
sockets in garbage collection)

It would seem that somehow when the underlying operating system has
too many descriptors open, the effect is that TCPSocket.open succeeds,
but it manages to succeed in such a manner that .readline on the
resulting socket will immediately trigger EOFError.

Looking at ext/socket.c, I don’t see how that’s possible. The
socket() call should be returning -1 (with errno set to ENFILE) which
should cause init_inetsock_internal to invoke
rb_sys_fail(“socket(2)”), but clearly that isn’t happening.

So:

ftp.rb gives this error message when there are too many connections
open in this process. The fault seems to be that TCPSocket.open
claims to succeed when it didn’t really. It is a mystery why that
happens.

robertjames · January 24, 2007, 4:32pm

Eric H. [email protected] writes:

    from test.rb:4:in `map'
    from test.rb:4

Fair enough. I wasn’t able to replicate this on my system with this
code exactly, (ruby 1.8.5, but on Debian, not OS X) but it was able to
point me in the right direction. (I was able to replicate it by using
a “127.0.0.1” instead of “localhost” and using ulimit to first reduce
my open descriptors limit to something small)

Okay, so now the theory is that when ftp.rb throws the EOFError upon
connection what’s happened is that the remote end has fallen over from
having too many open connections.

Specifically, if you hit an inetd-based service too many times without
closing the connection then you put the server into a state where it
will accept the connection and then immediately close it. On the
server, what’s happening is that inetd is accepting the connection
but then the ftp daemon is exiting immediately when it tries to open
some files that it reads on startup, like libc. What I get in the
logs is:

Jan 24 08:43:56 esau inetd[1641]: ftp/tcp server failing (looping),
service terminated for 10 min

“looping” here means that the ftpd program is exiting immediately when
inetd tries to start it.

It may be that non-inetd ftp servers exhibit similar behavior
(i.e. accept a connection yet close it immediately) when hit
with too many simultaneous open connections, but I haven’t
investigated those. I’ll note that my non-inetd ssh server gets into
this state (where it can accept a connection, but then closes it
before spitting out the identification string) after a mere 10
simultaneous unclosed connections.

So if ftp.rb is encountering this situation on initial connect, it
means that the server closed the control socket immediately after
connecting, and the ftp server is probably overloaded.

If ftp.rb is encountering this EOFError elsewhere, (as was alleged
elsewhere in the thread where someone mentioned ftp.rb blowing up with
an EOFError after retrieving 60 files) then the culprit is probably
that the other end closed the control connection unexpectedly.
Although this could mean that ftp.rb was hammering the other end with
connections that aren’t being closed, the ftp.rb code doesn’t look
like it would do that. More likely, the ftp server on the other end
is enforcing clumsily some limit about the number of transactions per
connection. (Or some NATting firewall in the middle is getting
confused, and shutting the connection down. I’ve seen that happen
when dealing with an FTP server behind the firewall)

Probably, the function getline in ftp.rb should be modified to catch
EOFError and turn it into an FTPProtoError, (the same type of error
you’d get if the socket closed unexpectedly partway through a
response) though the comments seem to indicate that the current
situation is deliberate:

def getline
  line = @sock.readline # if get EOF, raise EOFError
  line.sub!(/(\r\n|\n|\r)\z/n, "")
  if @debug_mode
  print "get: ", sanitize(line), "\n"
  end
  return line
end
private :getline

I contend that the current situation is a bug, and getline inside
ftp.rb should read:

def getline
  begin
    line = @sock.readline # if get EOF, raise EOFError
  rescue EOFError
    raise FTPProtoError, "Connection closed unexpectedly"
  end
  line.sub!(/(\r\n|\n|\r)\z/n, "")
  if @debug_mode
    print "get: ", sanitize(line), "\n"
  end
  return line
end
private :getline

At the very least, it should raise some sort of FTPError.

robertjames · January 24, 2007, 8:24am

On Jan 23, 2007, at 20:22, Daniel M. wrote:

This fails with the same EOFError reported in ftp.rb, after opening
the 256th socket. (I use “map” instead of “each” to keep from closing
sockets in garbage collection)

$ cat test.rb
soft, hard = Process.getrlimit Process::RLIMIT_NOFILE
require ‘socket’

sockets = (1…soft).map do |n|
s = TCPSocket.open “localhost”, 80
s.write “GET / HTTP/1.0\r\n\r\n”
print “%4d %s” % [n, s.readline]
s
end

p sockets

$ ruby test.rb
1 HTTP/1.1 200 OK
[…]
252 HTTP/1.1 200 OK
test.rb:5:in initialize': Too many open files - socket(2) (Errno::EMFILE) from test.rb:5:in open’
from test.rb:5
from test.rb:2:in map' from test.rb:4:in each’
from test.rb:4:in `map’
from test.rb:4

It would seem that somehow when the underlying operating system has
too many descriptors open, the effect is that TCPSocket.open succeeds,
but it manages to succeed in such a manner that .readline on the
resulting socket will immediately trigger EOFError.

$ ruby -v
ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-darwin8.8.2]

happens.
–
Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!

robertjames · January 24, 2007, 6:05pm

Wow! Top notch debugging.

+1 for the error message change - it would have saved me (and
presumably the other posters I referenced) a lot of guesswork. (On the
other hand, I would have missed a chance to sharpen my debugging
skills… ;-).

BTW, I did some log analysis, and found that I consistently got the
EOFErrors almost always after 128 ftp opens.

robertjames · January 24, 2007, 8:05pm

On Jan 24, 2007, at 07:31, Daniel M. wrote:

  print "get: ", sanitize(line), "\n"
    line = @sock.readline # if get EOF, raise EOFError
At the very least, it should raise some sort of FTPError.

Can you propose a patch on the Ruby tracker? You may need to
subscribe to ruby-core to help get it committed.

–
Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!