Memory leak

Well, seems error not in my code. I made simple version of crawler,
there is just could not be my mistake. But, the program is still have
memory leak :frowning: … you can check yourself, here is the code(after 2000
random urls mem usage >200mb):

require ‘rubygems’
require ‘mechanize’
require ‘thread’

mutex = Mutex.new

threads = []
$n = 0

THREADS = 50
q = SizedQueue.new(THREADS * 2)

threads = (1…THREADS).map do
Thread.new q do |qq|
until qq.equal?(myLink = qq.deq)
mutex.synchronize do
puts ($n +=1).to_s # + " : " + print_class_counts.to_s
end
begin
agent = WWW::Mechanize.new{ |agent|
agent.history.max_size=1
agent.open_timeout = 20
agent.read_timeout = 40
agent.user_agent_alias = ‘Windows IE 7’
agent.keep_alive = false
}
page = agent.get(myLink)
puts myLink
puts page.forms.length

page.forms.each do |form|
end
rescue
end
 end

end
end

File.foreach(“bases/base.txt”) do |line|
line.chomp!
q.enq(line)
end

threads.size.times { q.enq q}
sleep(120)

threads.each { |t| t.join}

It’s very sad, because I like ruby very much, but seems it does not fit
to my projects :frowning:

Btw, what version of Ruby is this? IIRC there was a bug with
Array#shift up to 1.8.6 which could also cause these effects.

ruby 1.8.6 (2008-08-11 patchlevel 287) - it’s strange because I’ve
updated it week ago :slight_smile:

wow, when I used ruby 1.8.6, max amount of memory for the program was
500-600mb… with ruby 1.8.7 it can easy get more than 1GB

On Wed, Oct 21, 2009 at 10:52 PM, Rob D. [email protected] wrote:

I retest my script again on 10k+ database and seems it stops to grow
when reach 550mb size… I’m running 50 threads, so 10mb per thread. As
I understand this is still not good ?

http://www.mikeperham.com/2009/05/25/memory-hungry-ruby-daemons/

On 10/23/2009 11:28 PM, Rob D. wrote:

     puts ($n +=1).to_s # + " : " + print_class_counts.to_s
       }

end
end

You create threads and fork a process for every single item to process.
This has some consequences:

  • your threads will eat all the entries in the queue very quickly
  • you will get a large number of processes immediately

In this setup you do neither need threads nor a queue. Basically you
just need to iterate the input list and fork off a process for every
item you meet. However, then you do not have any control over
concurrency and your CPU will suffer. With the setup you presented you
should at least have threads wait for their processes to return so a
single thread does not fork off more than one process at a time.

Kind regards

robert

Well, seems I found solution…
I tried to make some test on python as well. Simple script, previously
posted, eat memory on python too… and the only way I had it to use
forks. I checked out forkoff, but produce some strange bugs. This is the
working code:

threads = (1…THREADS).map do
Thread.new q do |qq|
until qq.equal?(myLink = qq.deq)
mutex.synchronize do
puts ($n +=1).to_s # + " : " + print_class_counts.to_s
end
fork # <----- You need to fork it, after exit fork will release
memory
begin
agent = WWW::Mechanize.new{ |agent|
agent.history.max_size=1
agent.open_timeout = 20
agent.read_timeout = 40
agent.user_agent_alias = ‘Windows IE 7’
agent.keep_alive = false
}
page = agent.get(myLink)
puts myLink
puts page.forms.length

       page.forms.each do |form|
       end
     rescue
     end
   end
 end

end
end

You create threads and fork a process for every single item to process.
This has some consequences:

  • your threads will eat all the entries in the queue very quickly
  • you will get a large number of processes immediately

In this setup you do neither need threads nor a queue. Basically you
just need to iterate the input list and fork off a process for every
item you meet. However, then you do not have any control over
concurrency and your CPU will suffer. With the setup you presented you
should at least have threads wait for their processes to return so a
single thread does not fork off more than one process at a time.

sure, you’re right, I forget to write it here, but in my own code there
is Process.wait after each fork :slight_smile:

In a .net windows application the form might have resource leaks though
it is running with managed codes. We can use the below procedure to
check if a form is having resource leak.
Open Windows Task Manager
Click on Process tab.
Select “View” in the menu and then select “Select Columns” menu item.
Check the USER Objects and GDI Objects (check boxes) to make them appear
on the process page list header.
The code in your project that lunches the Win Form, please ensure you
have called the Dispose method. Forms implement the IDisposable
interface so their dispose method must be called the moment they are no
longer needed (Test 1). We can call Dispose explicitly or even better to
instantiate it implicitly by the help of using clause (Test 2).
We can use the GC.Collect() after the using statement or after the call
to dispose for troubleshooting purpose.
Now time to launch the form. Please note the note the USER Objects and
GDI Objects values at the task manager. Close the form after some time
and when the form is closed, note the values again at the task manager.
We can find the value is decreased if it is increased then there is a
leak in the form.
Fix the resource leak and remove the call to GC.Collect(). It is
generally unnecessary to make an explicit call to GC.Collect().

http://www.mindfiresolutions.com/Checking-resource-leaks-in-a-form-272.php

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs