Good Afternoon Rubyists (sp?),
I've been playing around with the language for about a month now,
and have just completed my first “production” project involving a
web-crawler using REST.
I'm using it to run around the internet grabbing prices for Yugioh
cards as part of my graduate level CS courses in algorithm design and
implementation. So far, the code runs just fine.
The problem I'm encountering is that after the script has been
running for about 6 hours, my Windows Virtual Memory has maxed out
(We’re talking about a 51 GB Virtual Memory Cache). At this point, I get
a message from Win8.1 saying that Cygwin (through ruby) needs to close
because the system is running low on resources.
"Okay," I think, "This must be a memory leak issue." So I start to
dig around the forums and Google looking for answers. So it would seem
that the GC happens to be the culprit here… especially when it comes
to strings and hashes.
As luck would have it, my script uses a lot of strings and NESTED
hashes! I use symbols where possible, but I don’t know of any way to
make a string such as “Happy Card Store” into a symbol while preserving
the spaces. Thus, for the moment, I’m kind of stuck into trying to
figure out a way to keep the script from going kaboom.
One of the things that I've already done is set string variables to
NIL in methods when I am done using them. I’m not sure how much of a
difference this is making because I’m in preliminary load testing, but
thus far, it seems to hold promise.
Outside of calling GC.start every pre-determined number of
iterations, what other things might help with controlling memory usage?
Here is a list of the gems this script makes use of:
require ‘json’
require ‘watir’
require ‘nokogiri’
require ‘csv’
require ‘pp’
require ‘net/http’
require ‘uri’
require ‘fileutils’
I look forward to the response from the community!
Thank you,
JK