Hi,
I’ve made a script in Ruby 1.8 that use the gems mechanize, nokogiri and
open-uri and run under Linux.
This script is a webcrawler that scan a bigger site that contains a big
amount of data of international firm.
I’m interesting in create a db with only some data and not the full
data.
Mine script run perfectly and grab all data in the correct order but
after 6-8 hour that the script run the amount of memory that use is
enormous (1gb).
I save in a file the results of scraping and empty the buffer of data
every 10 firm collect.
I’ve follow this post ofr obtain this results because before the script
used this amount of memory just after 4hour.
Someone can help me to reduce this problem and optimize this script?
Exist an IDE that make an efficient debug for ruby?
I think that there is something that I’ve missed.
after 6-8 hour that the script run the amount of memory that use is
enormous (1gb).
I save in a file the results of scraping and empty the buffer of data
every 10 firm collect.
It seems either you do not free the memory (and thus have created a
leak yourself) or you suffer from the mentioned bug.
I’ve follow this post ofr obtain this results because before the script
used this amount of memory just after 4hour. Ruby Memory Management - Stack Overflow
Someone can help me to reduce this problem and optimize this script?
Exist an IDE that make an efficient debug for ruby?
I think that there is something that I’ve missed.
First thing I’d do is to update Ruby version to a more recent 1.9.*
version. That will be faster also and likely has a fix for the
leakage bug mentioned on the stackoverflow page. If your problem
persists, you need to look into your code.
A simple test would be to write out statistics per class on a regular
basis, e.g.