Weird Garbage Collection Behavior

aris · November 2, 2012, 5:28am

I wrote a simple Sinatra app that takes any String from a URL parameter,
stores that value as an attribute of a Java DTO (MessageHolder.java),
and renders an ERB template that displays that value back in an HTML
page. The complete source can be found here:
https://github.com/billkoch/garbage_collection_weirdness

The strange behavior I noticed when profiling the app is the first
invocation:

GET /1

will create a MessageHolder object on the heap (as expected) and when I
force garbage collection, the object is reclaimed (again as expected).

However, any subsequent call:

GET /2

will create a MessageHolder object on the heap, and forcing garbage
collection does NOT reclaim the object. I took a look at a heap
snapshot, and can’t figure out why the org.jruby.RubyThread is
keeping a reference to my Java DTO.

I’m running my app with Tomcat 6 (although I experience the same
problem with 7), Java 1.6, JRuby 1.7.0, and a war built using Warbler.

Attached is a screenshot of the heap snapshot. I’m sure there’s
something obvious I’m missing here, but I cannot figure out what I’m
doing wrong. Any help or insight would be greatly appreciated!

knappsych · November 5, 2012, 1:15am

On Thu, Nov 1, 2012 at 10:28 PM, William K. [email protected]
wrote:

The strange behavior I noticed when profiling the app is the first
invocation:

GET /1

will create a MessageHolder object on the heap (as expected) and when I
force garbage collection, the object is reclaimed (again as expected).

Is there a reason you feel you need to force GC?

However, any subsequent call:

GET /2

will create a MessageHolder object on the heap, and forcing garbage
collection does NOT reclaim the object. I took a look at a heap
snapshot, and can’t figure out why the org.jruby.RubyThread is
keeping a reference to my Java DTO.

Maybe I am confused about your code. It appears to store the
MessageHolder instance into an instance variable, which makes me think
that reference would be alive as long as the Applicaiton object is
alive. The reference chain back to RubyThread is probably SInatra
storing your Application in a thread-local or something.

Charlie

knappsych · November 5, 2012, 3:42pm

Charles Nutter wrote in post #1082812:

On Thu, Nov 1, 2012 at 10:28 PM, William K. [email protected]
wrote:

The strange behavior I noticed when profiling the app is the first
invocation:

GET /1

will create a MessageHolder object on the heap (as expected) and when I
force garbage collection, the object is reclaimed (again as expected).

Is there a reason you feel you need to force GC?

I was doing some profiling of my app with a Java profiler – just trying
to make sure I didn’t have any memory leaks. Admittedly, the app I gave
as an example is a watered down version of what I was actually
profiling.

However, any subsequent call:

GET /2

will create a MessageHolder object on the heap, and forcing garbage
collection does NOT reclaim the object. I took a look at a heap
snapshot, and can’t figure out why the org.jruby.RubyThread is
keeping a reference to my Java DTO.

Maybe I am confused about your code. It appears to store the
MessageHolder instance into an instance variable, which makes me think
that reference would be alive as long as the Applicaiton object is
alive. The reference chain back to RubyThread is probably SInatra
storing your Application in a thread-local or something.

I didn’t know there was another way to pass variables to a view in
Sinatra (other than through instance variables), but I found I could
pass a Hash of locals to the view. (any perhaps there’s still a better
way to do it?) I now switched the Sinatra code to:

get '/:message' do
  message_holder = MessageHolder.new(params[:message])
  erb :message, content_type: :html, locals: { message_holder:

message_holder }
end

but I’m still seeing the same behavior – after the first call and a GC,
the MessageHolder instance is reclaimed. After the second call and a
GC, a MessageHolder instance is left on the heap.

I guess I’m confused as to why the MessageHolder object can be garbage
collected after the first call, but not after the second.

Thanks for all your help – I really appreciate it!

Charlie

knappsych · November 5, 2012, 10:37pm

Try passing -Xji.objectProxyCache=false. This disables the weak
mapping from Java objects to their wrappers, and may clear some things
up.

I passed -Djruby.ji.objectProxyCache=false in Tomcat’s JAVA_OPTS,
however the references still remain on the heap.

Provide a heap dump we can look at to investigate whether there’s
actually a problem. The image you posted earlier did not indicate
anything was wrong to me…a thread local references an Array, which
contains an object whose instance variables point at your object.
Without digging deeper, it’s impossible to tell if one of these
references shouldn’t exist.

I also wrote a JMeter script to put the app under load. After 250K
requests, 54 MessageHolder instances remain on the heap. I’ve uploaded
the heap dump to DropBox:
http://dl.dropbox.com/u/63441574/Tomcat-2012-11-05.hprof.

I’ll be the first to admit, I’m definitely out of my league here – I
can’t make heads or tails of the heap dump. Thank you once again for
all your help.

knappsych · November 6, 2012, 2:12am

On Mon, Nov 5, 2012 at 3:37 PM, William K. [email protected] wrote:

http://dl.dropbox.com/u/63441574/Tomcat-2012-11-05.hprof.

I’ll be the first to admit, I’m definitely out of my league here – I
can’t make heads or tails of the heap dump. Thank you once again for
all your help.

I’ll have a look at this, but 54 instances still alive out of 250k
doesn’t
sound like a problem to me. Maybe I can make the question a bit more
direct: why do you see this as a problem?

Charlie

knappsych · November 5, 2012, 5:30pm

On Mon, Nov 5, 2012 at 8:42 AM, William K. [email protected] wrote:

but I’m still seeing the same behavior – after the first call and a GC,
the MessageHolder instance is reclaimed. After the second call and a
GC, a MessageHolder instance is left on the heap.

I guess I’m confused as to why the MessageHolder object can be garbage
collected after the first call, but not after the second.

Well first of all, just triggering a full GC is never guaranteed to do
anything. The objects in question are also often weakly referenced,
which means the GC may take a couple passes to decide they’re ready to
be collected. This is the case for Java objects floating around Ruby;
we weakly map them to the wrapper around them (this behavior will go
away in JRuby 9k).

It’s also possible there’s something Sinatra is doing here that holds
references to previous-request data in some cases.

Now, unless you see this as a systemic problem (lots of objects or
large objects not getting collected) I wouldn’t worry about it. If you
are still concerned…

Try passing -Xji.objectProxyCache=false. This disables the weak
mapping from Java objects to their wrappers, and may clear some things
up.
Poke around Sinatra to see how it’s storing these values.
Provide a heap dump we can look at to investigate whether there’s
actually a problem. The image you posted earlier did not indicate
anything was wrong to me…a thread local references an Array, which
contains an object whose instance variables point at your object.
Without digging deeper, it’s impossible to tell if one of these
references shouldn’t exist.

Charlie