GC help

I am still running out of memory with my ruby application. For some
reason,
there are objects that are not getting reclaimed long after their use is
up.
I understand the mark and sweep GC algorithm. Is there a method in Ruby
to
set an object’s memory space to be collected?

Joey M. wrote:

I am still running out of memory with my ruby application. For some reason,
there are objects that are not getting reclaimed long after their use is up.
I understand the mark and sweep GC algorithm. Is there a method in Ruby to
set an object’s memory space to be collected?

Ruby’s GC will eventually collect unreferenced objects without any
intervention on your part. It may not collect them on the first sweep,
but it will collect them. You can force GC to run earlier than normal by
calling GC.start, but that doesn’t necessarily mean that every
unreferenced object will be collected on that sweep.

In one of your earlier posts you said you were using a 3rd-party
library? Have you determined that it’s not responsible for the memory
allocations?

Have you determined that no references exist to objects that you think
should have been cleaned up?

I found out it is not the 3rd party code. I am downloading images
(~320,000
ct.) as strings, and saving each to a file. The file handlers are
getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don’t understand is that
the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right? I’ve invoked GC
at
the end of the method and after each iteration that the method gets
called.
I’ve even set the strings to nil after they’ve been saved to a file and
before gc is called. The app is using 2G of ram and 4G of swap before it
runs of out memory and crashes about 1/3rd of the way through. I’m
really
starting to doubt Ruby’s ability to do memory intensive work. Any ideas?

Joey M. wrote:

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file.

Why? If the images come from the Internet and go to disk, why do you
need to read them into Ruby working storage? Can’t you just shell out to
“wget” or “curl”?

Good question, unfortunately this is not a simple HTTP server. It’s an
industry standard client/server communication called RETS. Real Estate
Transaction Standard. It requires a third party library in order to
interface with the server. I wish it were that simple!!

On Sun, Mar 30, 2008 at 8:16 PM, M. Edward (Ed) Borasky
[email protected]

Joey M. wrote:

starting to doubt Ruby’s ability to do memory intensive work. Any ideas?
Many people have used Ruby for long-running tasks that use a lot of
memory. If Ruby was not collecting unused strings, don’t you think
somebody would have noticed it by now?

Actually, I think you’ve ruled out Ruby being the problem. What else is
running? Could it be using up the memory?

From: “Joey M.” [email protected]

runs of out memory and crashes about 1/3rd of the way through. I’m really
starting to doubt Ruby’s ability to do memory intensive work. Any ideas?

Do you have a small bit of code that reproduces the problem?

How are you downloading the images? Net::HTTP ? Or…?

If you can provide a small program that exhibits the memory
leak I’m sure others here would be happy to try it on their
systems as well.

Regards,

Bill

On Sun, Mar 30, 2008 at 5:07 PM, Joey M. [email protected]
wrote:

I found out it is not the 3rd party code. I am downloading images (~320,000
ct.) as strings, and saving each to a file. The file handlers are getting
closed. Using the Dike gem, I think I determined the bulk of the leaked
memory is in these string variables. But what I don’t understand is that the
variables containing the strings are local to a method of an object and
should be unreferenced when that method is done, right?

Usually, yes. However, there are ways a reference could be kept live
beyond the end of the method. Obviously, if you pass the string out of
the
method to something else that keeps a reference to it, that would do it,
but
also if you pass a closure (e.g., a proc object) that refers to the
methods
application’s binding (I think that’s the right terminology) that gets
stored
somewhere else, that would keep the method’s local variables “live” even
after
the method exits.

Its hard to say if something like that might be happening without the
code.

I shouldn’t have criticized ruby like that, it actually is my second
favorite language right behind PHP. If I learned RoR, it might be my
most
favorite. It really does a good job, is easy to write, and has a lot of
features. Not to mention a great community.
I am just really frustrated about this stupid app that I can’t get
working
right. I am working on a workaround since it is only going to do a major
download once then update daily.
Thanks for all the help though, it feels good that there is a place to
turn
to for help.

On 31/03/2008, Tim H. [email protected] wrote:

the end of the method and after each iteration that the method gets
called.
I’ve even set the strings to nil after they’ve been saved to a file and
before gc is called. The app is using 2G of ram and 4G of swap before it
runs of out memory and crashes about 1/3rd of the way through. I’m really
starting to doubt Ruby’s ability to do memory intensive work. Any ideas?

Many people have used Ruby for long-running tasks that use a lot of memory.
If Ruby was not collecting unused strings, don’t you think somebody would
have noticed it by now?

I have noticed :->

Michal

Ok, I was able to get it all into one class. The problem lies in this
class:
class Picture

def initialize(db,rets,rets_class)
@db = db
@rets = rets
@rets_class = rets_class
@attempts = 0
end

def getPic(key)
begin
get_object_request = GetObjectRequest.new(@rets_class, “Photo”)
get_object_request.add_all_objects(key)
get_object_response = @rets.session.get_object(get_object_request)
content_type_suffixes = { “image/jpeg” => “jpg”}
makePicDir(key)
get_object_response.each_object do |object_descriptor|
object_key = object_descriptor.object_key
obj_id = object_descriptor.object_id
content_type = object_descriptor.content_type
description = object_descriptor.description
#print “#{object_key} object ##{object_id}”
#print “, description: #{description}” if !description.empty?
#puts
suffix = content_type_suffixes[content_type]
pic = object_descriptor.data_as_string
savePic(key,obj_id.to_s,suffix,description,pic)
end
get_object_response = nil
rescue => e
puts "Error retrieving pictures for #{key}: " + e
if @attempts <= 5
@attempts += 1
puts “retrying”
retry
else
puts “failed”
@attempts = 0
end
end
@attempts = 0
end

def getThumb(key)
begin
get_object_request = GetObjectRequest.new(@rets_class, “Thumbnail”)
get_object_request.add_all_objects(key)
get_object_response = @rets.session.get_object(get_object_request)
content_type_suffixes = { “image/jpeg” => “jpg”}
get_object_response.each_object do |object_descriptor|
object_key = object_descriptor.object_key
obj_id = object_descriptor.object_id
content_type = object_descriptor.content_type
description = object_descriptor.description
#print “#{object_key} object ##{object_id}”
#print “, description: #{description}” if !description.empty?
#puts
suffix = content_type_suffixes[content_type]
pic = object_descriptor.data_as_string
savePic(key,obj_id.to_s,suffix,description,pic,true)
end
get_object_response = nil
rescue => e
puts "Error retrieving thumbs for #{key}: " + e
if @attempts <= 5
@attempts += 1
puts “retrying”
retry
else
puts “failed”
@attempts = 0
end
end
@attempts = 0
end

def makePicDir(key)
FileUtils.mkpath("#{$pic_dir}#{key}/thumb")
end

def savePic(key,id,suffix,desc,pic,thumb_bool=false)
if thumb_bool
file_name = $pic_dir + key + “/thumb/” + id + “.” + suffix
location = “/” + key + “/thumb/” + id + “.” + suffix
else
file_name = $pic_dir + key + “/” + id + “.” + suffix
location = “/” + key + “/” + id + “.” + suffix
end
self.savePicFile(file_name,pic)
size = File.size(file_name)
if thumb_bool
self.insertThumbDB(key,id,location)
else
self.insertPicDB(key,id,desc,size,location)
end
end

def savePicFile(file_name,pic)
f = File.open(file_name, “wb”)
f << pic
f.close
end

def insertPicDB(key,id,desc,size,location)
description = @db.database.escape_string(desc)
if
@db.DBinsert(“PICS”,“pkey,id,description,size,location”,"#{key},#{id},’#{description}’,’#{size}’,’#{location}’")

puts “#{key} #{id} pic added”

 print ":"

end
end

def insertThumbDB(key,id,location)
if @db.DBupdate(“PICS”,“thumb = ‘#{location}’”," pkey = #{key} and id

#{id}")
# puts “#{key} #{id} thumb added”
print “.”
end
end

def deletePic(key)
self.deletePicDir(key)
self.deletePicDB(key)
end

def deletePicDir(key)
if File.exists?("#{$pic_dir}#{key}")
FileUtils.remove_dir("#{$pic_dir}#{key}")
end
end

def deletePicDB(key)
if @db.DBdelete(“PICS”,“pkey = #{key}”)
print “-”

puts “#{key} pics deleted from db”

end
end

end #end class

Few suggestions, some not really related to the problem:

On Wed, Apr 2, 2008 at 6:52 PM, Joey M. [email protected]
wrote:

def getPic(key)
begin
get_object_request = GetObjectRequest.new(@rets_class, “Photo”)
get_object_request.add_all_objects(key)
get_object_response = @rets.session.get_object(get_object_request)
content_type_suffixes = { “image/jpeg” => “jpg”}

Make content_type_suffixes a class constant, or member if you need to
append to it.
Now you are constructing and destructing the object on each method call.

 makePicDir(key)
 get_object_response.each_object do |object_descriptor|
   object_key =  object_descriptor.object_key
   obj_id = object_descriptor.object_id
   content_type = object_descriptor.content_type
   description = object_descriptor.description
   #print "#{object_key} object \##{object_id}"
  •    #print ", description: #{description}" if !description.empty?
    
  •    #print ", description: #{description}" unless
    

description.empty? # a matter of taste/style

  puts "retrying"
  retry
else
  puts "failed"
  @attempts = 0
end

end
@attempts = 0
end

It seems that you could refactor the common code of these two methods
in to a new one.
The benefit would be shorter/more readable code and better
responsibility split, the drawback slower execution.

   description = object_descriptor.description
if @attempts <= 5

 file_name = $pic_dir + key + "/thumb/" + id + "." + suffix
 location = "/" + key + "/thumb/" + id + "." + suffix

else
file_name = $pic_dir + key + “/” + id + “.” + suffix
location = “/” + key + “/” + id + “.” + suffix
end
self.savePicFile(file_name,pic)

self is not necessary, the same below

def savePicFile(file_name,pic)

  • f = File.open(file_name, “wb”)
  • File.open(file_name, "wb") do |f|
    

f << pic

  • f.close
  • end # automatic close on exceptions
    
 print ":"

def deletePic(key)
self.deletePicDir(key)
self.deletePicDB(key)
end

def deletePicDir(key)

  • if File.exists?(“#{$pic_dir}#{key}”)
  •  FileUtils.remove_dir("#{$pic_dir}#{key}")
    
  • end
  • pic_dir = $pic_dir + key # save one allocation
  • FileUtils.remove_dir pic_dir if File.directory? pic_dir

end #end class

I’d suspect the database code is keeping some cache. This code seems
fine.

I see you are using libRETS. Did you try to rule it out by replacing
calls to libRETS (especially to data_as_string) with some stubs
(create a random long string on the fly).
If you are on unix, you can use IO.read(“/dev/random”, 100000). If you
are on Windows, choose another long enough file to read.

The documentation says that GetData() abandons ownership to the object
it returns. It’s possible that SWIG-generated wrapper doesn’t handle
this properly.

Jano

Many people have used Ruby for long-running tasks that use a lot of memory.
If Ruby was not collecting unused strings, don’t you think somebody would
have noticed it by now?

I have noticed :->

Michal

I have too, and it drives me crazy when my mongrel instances eat up
600MB of memory.
I’d be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.
Probably a rails thing, but anyway.
Enough ranting.
Have a good one.
-R

On 4/5/08, Roger P. [email protected] wrote:

I’d be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.

Send me the 150 and i send you a copy of ramaze :wink:

What version of ruby are you running? I recently saw an issue using
1.8.6 with a low patch level. Upgrading to p114 might solve things for
you.

On Fri, Apr 4, 2008 at 7:26 PM, Roger P. [email protected]
wrote:

I’d be willing to offer a bounty of $150 to anyone able to clear this
up. It happens especially often when you run multiple threads, it
seems.
Probably a rails thing, but anyway.
Enough ranting.
Have a good one.
-R

Posted via http://www.ruby-forum.com/.

hat very