BIG memory problem

Hi all,

I’m having a really odd memory problem with a small ruby program I’ve
written. It basically takes in lines from input files (which represent
router flows), deduplicates them (based on elements of the line) and
outputs the unique flows to file. The input file often contains over
300,000 lines of which about 25-30% are duplicates. The trouble I’m
having is that the program (which is intended to be long running) does
not seem to release any memory back to the system and in fact just
increases in memory footprint from iteration to iteration. It should
use about 150 MB by my estimates but sails through this and yesterday
slowed to a halt at about 1.6GB (due to the GC by my guess). This
doesn’t make any sense to me as at times I am deleting data structures
that occupy at least 50MB of memory.

The codebase is slightly to big too big to pastie but it is available
here http://svn.tobyclemson.co.uk/public/trunk/flow_deduplicator .
There are actually only 2 classes of importance and 1 script but I
don’t know if pastie can handle that.

Any help would be greatly appreciated as the alternative (pressures
from above) is to rewrite in Python (which involves me learning
Python)

Thanks in advance,
Toby

2008/8/8 [email protected] [email protected]:

use about 150 MB by my estimates but sails through this and yesterday
from above) is to rewrite in Python (which involves me learning
Python)

I think I have found a problem. In the main loop (in bin/dedupe),
you use a single Timestamp instance, which is destructively
modified by calling advance.

Now this single Timestamp instance is used as a key for all
calls to checksum_buffer.add(). As a result, the @buffers hash
will always have only one entry and this single entry will hold all
flow.checksum/flow.timestamp pairs ever. Since the retention treshhold
is 1, this single @buffers entry that hold all data will never be
deleted.

The solution should be to make Timestamp#advance nondestructive
and change the line

timestamp.advance

in the main loop to

timestamp = timestamp.advance

Stefan

Sorry I don’t quite understand the problem - I can see that it
probably is one but I think it’s a matter of terminology. What do you
mean when you say destructively modified? I am modifying the value of
the timestamp in place? So that any reference to that timestamp will
be modified too? Should I be doing a duplication on the string that is
used to key the buffer in the buffers hash? I didn’t think that the
actual object was passed in when an argument is supplied, I thought a
copy of it was passed in…

How would I make Timestamp#advance nondestructive?
If it is easier than pasting here I can give you commmit priveleges on
that repository?

Thanks very much for your help,
Toby

On Aug 8, 1:31 pm, “Stefan L.” [email protected]

2008/8/8 [email protected] [email protected]:

If it is easier than pasting here I can give you commmit priveleges on
that repository?

Arguments are passed by reference. Not a reference to the variable,
but a reference to the object. That’s how most OO languages work.

Regarding your program: Add an accessor for the :time to the
Timestamp class, then change the advance definition
to this:

def advance
  ts = self.dup
  ts.time += 60
  ts
end

Instead of modifying the instance, we create a new one with the
desired change.

Now in the main loop in dedupe change this line:

timestamp.advance

to

timestamp = timestamp.advance

This way ChecksumBuffer#add will actually get a different
timestamp object on each call.

Since you also use Enumerable#min on an array of Timestamp
objects, you need to add Timestamp#<=>:

def <=>(other)
    self.time <=> other.time
end

That should do it.

Thanks very much for your help,

You’re welcome!

Stefan

Ok I’ve gone and had a little play and yes the memory problem was
completely my fault. I was passing in the timestamp to use as the key
for the buffer rather than the current value of the timestamp. By
changing the line checksum_buffer.add(flow, timestamp) to
checksum_buffer.add(flow, timestamp.current) the problems are solved!
It’s just a shame it took me nearly a day of debugging and attempting
to learn Python and help from you guys to work that out!

Stefan, Edward, Thanks again for your help. I would never have noticed
that bug without your help Stefan,
Thanks,
Toby

On Aug 8, 4:01 pm, “[email protected][email protected]

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs