I’m having a really odd memory problem with a small ruby program I’ve
written. It basically takes in lines from input files (which represent
router flows), deduplicates them (based on elements of the line) and
outputs the unique flows to file. The input file often contains over
300,000 lines of which about 25-30% are duplicates. The trouble I’m
having is that the program (which is intended to be long running) does
not seem to release any memory back to the system and in fact just
increases in memory footprint from iteration to iteration. It should
use about 150 MB by my estimates but sails through this and yesterday
slowed to a halt at about 1.6GB (due to the GC by my guess). This
doesn’t make any sense to me as at times I am deleting data structures
that occupy at least 50MB of memory.
The codebase is slightly to big too big to pastie but it is available
here http://svn.tobyclemson.co.uk/public/trunk/flow_deduplicator .
There are actually only 2 classes of importance and 1 script but I
don’t know if pastie can handle that.
Any help would be greatly appreciated as the alternative (pressures
from above) is to rewrite in Python (which involves me learning
Thanks in advance,