Complete Java/Jruby lockup


#1

I apologize for being long and vague with this posting as I’m at a loss
as to what is happening. I’m not sure if I hitting a java bug, jruby
bug or an application bug.

The platform is jruby 1.2.0 running with Java 1.6.03.13 on Windows
server 2003. The jruby application is a rails application run with
mongrel that is used as “web service” for an OpenLaszlo front end.

I don’t have a lot of information at this point but these are the
symptoms:

  1. The standard output in the window stops stops displaying i.e doesn’t
    update the logging activity that is normally displayed with each
    transaction request. This happens very soon after launching the jruby
    application. On occasion it starts to scroll again with the logging
    activity but in general it is sporadic.

  2. Occasional logging of “dl: this is only a partial implementation,
    and it’s likely broken” in the window. This log item is not from my
    code.

  3. The application runs fine for a number of days but “at some point” -
    I know I’m vague here, if I try to terminate the jruby application with
    a Ctrl-C, it locks up. I can’t get the command prompt back, I can’t
    end-process the java application from the task manager.

  4. Typically, the failure process begins when the user informs me that
    a download that they are attempting to perform with the application is
    reporting a failure. At this point all I’m trying to do is terminate
    and restart the jruby application to allow the user to execute the file
    download. The code that performs this download uses the net/ftp library
    to retrieve some files. Here is the code.

    begin
    puts “Connecting to #{ftp_server[0].value}”
    ftp = Net::FTP.new(ftp_server[0].value) #, ‘anonymous’,
    ‘anything’)
    puts “Logging on with #{ftp_username[0].value}”
    ftp.login(ftp_username[0].value, ftp_password[0].value)
    # files = ftp.chdir("…/")
    #test = ftp.pwd
    puts “Getting file #{ftp_filename[0].value}”
    ftp.gettextfile(ftp_filename[0].value)
    ftp.close
    puts “Done”
    message += “Successful Download - #{ftp_filename[0].value},\n”
    rescue Exception => e
    puts “#{ e } (#{ e.class })!”
    if (ftp_filename[0] == nil)
    message += “Failed Download - #{item},\n”
    else
    message += “Failed Download - #{ftp_filename[0].value},\n”
    end

    end

  5. The worst part is that the whole server reboots after about 5 to 10
    minutes. Before this reboot happens, the memory usage of the java
    process doesn’t seem to suddenly increase, the CPU usage is normal for
    all 4 CPUs in the server so everything looks ok i.e. no indication of a
    tight loop in the code. I just can’t kill the java process at all and
    the reboot happens only after I try to kill the java process!!

Has anyone experienced this or is there some additional logging or
information I can provide so I can trap this condition and get to the
bottom of it? Is the Net::FTP library not safe to use with JRuby? I
can’t see this problem as an application bug because I should still be
able to terminate the jruby application should I not? This is why I’m
suspicious that I’m not dealing with an application but but potentially
a jruby bug but … why can’t I even kill the underlying java process
when this error occurs? Is there something in jruby that can prevent
windows from killing the java vm? I have never had a problem killing
any other java process in the years I have used java to run my
applications. Again, this is why I think it might be a jruby issue.

Thanks


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#2

Interesting to note:
you left out the option of “Win 2003 idiosyncracy”

Couple of questions:

Are you using just the rails default logger? or something else?
Do you have logging setup to do any type of rolling? If you do we’ve
seen log files stop logging due to the fact that windows
“mysteriously” locks files which prevents rolling and other annoying
things.

Launch your task manager an under look at your cpu utilization tab and
at the bottom you should see something like kernel memory … what’s
your non paged pool count like when you start and when you ultimately
crash. We recently ran into a problem with Glassfish V2.0 where we
are running out of nonpaged pool memory… I realize you aren’t using
GF but some of the “hang” based symptoms you describe happen to us
when we run out of non paged pool memory.

On the file download stuff. We’ve seen issues with timeout.rb that
totally vaporize threads on us. (we haven’t finished our upgrade to
v1.2.0 yet tho)

Charles… Tom mentioned you and he might be working on/have worked on
fixing the timeout.rb issues… if so is that fix in 1.2.0?

Andrew. What JVM options are you using when you run your service?
Are you using SSL at all?.. If you aren’t using the -server option.
We’ve been able to crash the JVM pretty regularly… however using
-server things run a lot better.

I’m sorry this doesn’t give you the answers you might be looking
for… but I’m trying to give you some of what we’ve seen in the hopes
that something I said proves helpful.

Jay

On Wed, Apr 29, 2009 at 4:09 PM, Andrew W.
removed_email_address@domain.invalid wrote:

  1. The standard output in the window stops stops displaying i.e doesn’t
  2. Typically, the failure process begins when the user informs me that
    ftp.login(ftp_username[0].value, ftp_password[0].value)
    message += “Failed Download - #{item},\n”
    process doesn’t seem to suddenly increase, the CPU usage is normal for
    a jruby bug but … why can’t I even kill the underlying java process
    To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#3

How does one find out what is and isn’t safe for use with JRuby? Can I
not assume that if it works that it should be safe? :wink: The entire
application “works” including the Net::FTP but I guess I have to agree
that something may not be working safely!

Jacob K. wrote:

as to what is happening. I’m not sure if I hitting a java bug, jruby
update the logging activity that is normally displayed with each
4) Typically, the failure process begins when the user informs me that
puts “Logging on with #{ftp_username[0].value}”
if (ftp_filename[0] == nil)
minutes. Before this reboot happens, the memory usage of the java
suspicious that I’m not dealing with an application but but potentially

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#4

That error is coming out of jruby/lib/ruby/1.8/dl.rb. I don’t actually
know what that does, other than that it looks like it’s related to
trying to do C stuff. The equivalent MRI module certainly involves C
code, and it looks like the JRuby version pulls in a bunch of FFI stuff.
As such, I’d have to guess that something in your code (possibly
Net::FTP) is using C extensions and so isn’t safe for use with JRuby.

Andrew W. wrote:

a Ctrl-C, it locks up. I can’t get the command prompt back, I can’t
ftp = Net::FTP.new(ftp_server[0].value) #, ‘anonymous’,
rescue Exception => e

can’t see this problem as an application bug because I should still be


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#5

LOL. I know. All my development actually takes place on a Linux
workstation!

I’m not using any explicit logger. I’m just noting the standard and
error output in the Windows command prompt window that is normally
output by rails and my puts statements.

The next time this happens I will note the nonpaged pooled memory. I’m
using everything “as is” i.e. not changing any VM settings and I haven’t
tried the -server option but I will now. I’m not making any use of SSL.

I should note that I was using JRuby V1.1.4 and had the same reboot
issue. I was hoping that the problem would just go away with the
upgrade to V1.2.0

Thanks for your feedback!

Jay McGaffigan wrote:

v1.2.0 yet tho)
for… but I’m trying to give you some of what we’ve seen in the hopes

activity but in general it is sporadic.
download. The code that performs this download uses the net/ftp library
ftp.gettextfile(ftp_filename[0].value)

applications. Again, this is why I think it might be a jruby issue.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#6

Jacob K. wrote:

That error is coming out of jruby/lib/ruby/1.8/dl.rb. I don’t actually
know what that does, other than that it looks like it’s related to
trying to do C stuff. The equivalent MRI module certainly involves C
code, and it looks like the JRuby version pulls in a bunch of FFI stuff.
As such, I’d have to guess that something in your code (possibly
Net::FTP) is using C extensions and so isn’t safe for use with JRuby.

I would not expect it to be Net::FTP, but obviously something is pulling
in dl.rb, and that makes it a very strong suspect.

We may want to disable dl.rb in releases and provide a flag to enable it
if you really know what you’re doing…at least until it’s considered
100% working.

Andrew: Can you do a search in the code you’re using for “require ‘dl’”
or similar? I’m guessing there’s probably a gem using it for something.
Or perhaps something uses win32 library?

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#7

Andrew W. wrote:

I should note that I was using JRuby V1.1.4 and had the same reboot
issue. I was hoping that the problem would just go away with the
upgrade to V1.2.0

Ahh that’s very interesting. In 1.1.4 the dl.rb library was not even
present.

Here’s something to try: pass -J-Djruby.native.enabled=false and see if
it improves. That turns off all external native libraries we use, which
now seem like they could all be suspects.

Also, make sure you’re running the latest Java 6 release; we have run
into bugs with earlier releases.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#8

Well, it did warn you that you were using a partial and likely broken
implementation of DL, didn’t it? In general, you should be glad if
anything using C extensions works, and not surprised if it either
doesn’t work or breaks eventually.

I’d guess that the VM unkillability + crash has to do with the FFI that
JRuby sets up to try to make it work becoming unstable and not coming
down cleanly.

Andrew W. wrote:

of FFI stuff. As such, I’d have to guess that something in your code

mongrel that is used as “web service” for an OpenLaszlo front end.
2) Occasional logging of “dl: this is only a partial implementation,
and restart the jruby application to allow the user to execute the file
# files = ftp.chdir(”…/")
else
all 4 CPUs in the server so everything looks ok i.e. no indication of a
when this error occurs? Is there something in jruby that can prevent


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#9

2009/4/30 Charles Oliver N. removed_email_address@domain.invalid:

Here’s something to try: pass -J-Djruby.native.enabled=false and see if it
improves. That turns off all external native libraries we use, which now
seem like they could all be suspects.

-J-Djruby.native.enabled=false will only turn off dl.rb in jruby-1.3
and later, since it uses FFI, which gets turned off by passing that
flag. The old dl.rb in 1.2 and earlier used JNA directly, and I don’t
think there was any check in the dl code to turn it off.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#10

Charles Oliver N. wrote:

haven’t tried the -server option but I will now. I’m not making any
if it improves. That turns off all external native libraries we use,
http://xircles.codehaus.org/manage_email

I have changed my startup to

jruby --server -J-Djruby.native.enabled=false script\server

I have been using the latest version of Java 6.

This may take a while to fail again. Typically I see it happen after a
week or two. I’ll keep you apprised when/if it happens again or if I
see any error messages that I haven’t seen before.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#11

Wayne M. wrote:

-J-Djruby.native.enabled=false will only turn off dl.rb in jruby-1.3
and later, since it uses FFI, which gets turned off by passing that
flag. The old dl.rb in 1.2 and earlier used JNA directly, and I don’t
think there was any check in the dl code to turn it off.

Perhaps it would be wise for Andrew to log the result of calling
“caller” at the top of dl.rb then, so we can see what’s using it?
Turning off native stuff may still be useful anyway, since there’s other
native code that could be causing problems.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#12

Charles Oliver N. wrote:

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

With the same problem appearing with V1.1.4 I thought the DL issue was
eliminated as a potential source of the issue. I can certainly turn on
the logging but I need a little help to do this. What do I need to do?


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#13

It looks like the issue has nothing to do with jruby after all!

I converted the application to run in pure ruby and the lock-up/reboot
problem still happened. The underlying issue may have to with the DB2
database.

Thanks for all your help and your suggestions!

Andrew W. wrote:

a Ctrl-C, it locks up. I can’t get the command prompt back, I can’t
ftp = Net::FTP.new(ftp_server[0].value) #, ‘anonymous’,
rescue Exception => e

can’t see this problem as an application bug because I should still be


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#14

Thanks for updating us! Let us know if there’s anything else we can do!

Andrew W. wrote:

I apologize for being long and vague with this posting as I’m at a loss

  1. The standard output in the window stops stops displaying i.e doesn’t
    end-process the java application from the task manager.
    ‘anything’)
    puts “#{ e } (#{ e.class })!”
  2. The worst part is that the whole server reboots after about 5 to 10
    able to terminate the jruby application should I not? This is why I’m

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#15

2009/4/29 Andrew W. removed_email_address@domain.invalid:

I apologize for being long and vague with this posting as I’m at a loss
as to what is happening. I’m not sure if I hitting a java bug, jruby
bug or an application bug.

The platform is jruby 1.2.0 running with Java 1.6.03.13 on Windows
server 2003. The jruby application is a rails application run with
mongrel that is used as “web service” for an OpenLaszlo front end.

Only a simple question: why do you use OpenLaszlo rather than Flex?


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

#16

Andrew W. wrote:

With the same problem appearing with V1.1.4 I thought the DL issue was
eliminated as a potential source of the issue. I can certainly turn on
the logging but I need a little help to do this. What do I need to do?

I would just like to know what’s loading DL, to be honest. Edit
lib/ruby/1.8/dl.rb and just add

p caller

…to the top of the file. If you’re using a “complete” JRuby jar, you
may need to crack it open and re-package it.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email