Ruby, memory and speed

I have a script that aggregates data from multiple file, store it all
in a hash, and then emit a summary on standard input. The input files
(text files) are fairly big, like 4 of about 50Mb and 4 of about 350Mb.
The hash will grow to about 500 000 keys. The memory footprint of the
ruby process as reported by top is above 2 Gigs.

When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.

Guillaume.

The code is as followed:

delta and snps are IOs. reads is a hash. max is an integer (4 in my
case).
It expects a line starting with a ‘>’ on delta. Then it reads some
information on delta (and discard the rest) and some more information
on snps (if present). All this is then recorded in the reads hash file.
Each entry entry in the hash are arrays with the 4 best match found so
far.

def delta_reorder(delta, snps, reads, max = nil)
l = delta.gets or return
snps_a = nil
loop do
l =~ /^>(\S+)\s+(\S+)/ or break
contig_name, read_name = $1, $2
read = (reads[read_name] ||= [])
loop do
l = delta.gets or break
l[0] == ?> and break
cs, ce, rs, re, er = l.scan(/\d+/)
er_i = er.to_i
cs && ce && rs && re && er or break
l = delta.gets while l && l != “0\n”
if snps
snps_a = []
er_i.times { l << snps.gets or break; snps_a << l.split[-1] }
end
score = (re.to_i - rs.to_i).abs - 6 * er_i
if max

i = read.bsearch_upper_boundary { |x| score <=> x[1] }

read.insert(i, [contig_name, score, cs, ce, rs, re, er,

snps_a])

read.slice!(max…-1) if read.size > max

     if read.size >= max
       min = read.min { |x, y| x[1] <=> y[1] }
       if score > min[1]
         min.replace([contig_name, score, cs, ce, rs, re, er,

snps_a])
end
else
read << [contig_name, score, cs, ce, rs, re, er, snps_a]
end
else
if !read[0] || score > read[0][1]
read[0] = [contig_name, score, cs, ce, rs, re, er, snps_a]
end
end
end
end
end

Example of data (comments after # are mine, not present in file):

read_name (hash key) is gnl|ti|379331986

gi|56411835|ref|NC_004353.2| gnl|ti|379331986 1281640 769
246697 246940 722 479 22 22 0 # Keep this info. Collect 22 lines
from snps IO
0 # Skip
440272 440723 156 617 41 41 0 # Keep this info. Collect 41 lines
from snps IO
147 # Skip 'til 0
-22
-206
-1
-1
-1
-1
-1
-1
-1
-1
-1
0
441263 441492 384 152 17 17 0 # Keep. Collect lines from snps.
-44 # Skip 'til 0
-1
-1
-1
37
0
gi|56411835|ref|NC_004353.2| gnl|ti|379331989 1281640 745 # and so
forth…
453805 453934 130 1 8 8 0
0

On Aug 17, 2006, at 9:54 AM, Guillaume M. wrote:

Does the performance of Ruby collapse past a certain memory usage?
Like the GC kicks in all the time.

I think you got it.

Regards, Morton

On 17.08.2006 15:54, Guillaume M. wrote:

Opteron 64bit, 32G memory, local scsi raided drive.

l =~ /^>(\S+)\s+(\S+)/ or break
contig_name, read_name = $1, $2

Small optimization, which will help only if delta_reorder is called
ofen:

 read = (reads[read_name.freeze] ||= [])

Background: a Hash will dup a non frozen string to avoid nasty effects
if the original changes.

To make people’s lives who want to play with this easier you could
provide a complete test set (original script + data files).

I don’t fully understand your processing but maybe there’s an option to
improve this algorithm wise.

Kind regards

robert

From: “Francis C.” [email protected]

This is a perfect example of what I’ve noticed many times: Ruby’s
performance is perfectly fast and acceptable until the working set gets a
certain (not terribly large) size, then it falls off a cliff. GC perhaps has
something to do with it, but I suspect that’s only a small part of the
problem.

Can anyone speculate as to what other subsystems or facets of Ruby’s
architecture, besides GC, might possibly cause this?

Regards,

Bill

Le 17 août 06, à 10:45, Robert K. a écrit :

4xAMD Opteron 64bit, 32G memory, local scsi raided drive.
file.
ofen:

read = (reads[read_name.freeze] ||= [])

Background: a Hash will dup a non frozen string to avoid nasty effects
if the original changes.

To make people’s lives who want to play with this easier you could
provide a complete test set (original script + data files).

Will do, when I get to my office.

Guillaume.

On 8/17/06, Guillaume M. [email protected] wrote:

I have a script that aggregates data from multiple file, store it all
in a hash, and then emit a summary on standard input. The input files
(text files) are fairly big, like 4 of about 50Mb and 4 of about 350Mb.
The hash will grow to about 500 000 keys. The memory footprint of the
ruby process as reported by top is above 2 Gigs.

This is a perfect example of what I’ve noticed many times: Ruby’s
performance is perfectly fast and acceptable until the working set gets
a
certain (not terribly large) size, then it falls off a cliff. GC perhaps
has
something to do with it, but I suspect that’s only a small part of the
problem.

Before I spend a lot of time understanding your algorithm: is there a
specific reason why you need to keep the entire set in memory? You say
you’re generating summary output at the end of the run, but can you
accumulate your summary in an object of fixed size?

Or does your summary depend on some kind of transformation on the entire
set
(like a numeric sort)? If so, there are things you can do to improve the
algorithm so you’re not keeping a large working set in memory.

Hi,

In message “Re: Ruby, memory and speed”
on Fri, 18 Aug 2006 00:16:33 +0900, “Francis C.”
[email protected] writes:

|This is a perfect example of what I’ve noticed many times: Ruby’s
|performance is perfectly fast and acceptable until the working set gets a
|certain (not terribly large) size, then it falls off a cliff. GC perhaps has
|something to do with it, but I suspect that’s only a small part of the
|problem.

This may be caused by the problem addressed in 1.9:

Mon Jul 10 19:22:19 2006 Tanaka A. [email protected]

* gc.c (gc_sweep): expand heap earlier.
  reported by MORITA Naoyuki.  [ruby-dev:28960]

						matz.

On 8/17/06, Bill K. [email protected] wrote:

Total, rank speculation here, but I don’t suspect GC because I
consistently see this kind of problem even when a lot of objects are
being created and not many are becoming garbage. My wild
unsubstantiated guess is that Ruby creates data structures in memory
that are not designed to maximize processor- and L2- cache
utilization. Perhaps data objects overflow cache lines, or they are
heavily pointer-driven (requiring more memory-bus cycles than is
optimal), or they exhibit poor locality of reference.

On 8/17/06, Yukihiro M. [email protected] wrote:

This may be caused by the problem addressed in 1.9:

Mon Jul 10 19:22:19 2006 Tanaka A. [email protected]

    * gc.c (gc_sweep): expand heap earlier.
      reported by MORITA Naoyuki.  [ruby-dev:28960]

                                                    matz.

Any plans to back port it to ruby_1_8 branch?

Hi,

In message “Re: Ruby, memory and speed”
on Fri, 18 Aug 2006 02:52:33 +0900, “Kent S.”
[email protected] writes:

|> This may be caused by the problem addressed in 1.9:
|>
|> Mon Jul 10 19:22:19 2006 Tanaka A. [email protected]
|>
|> * gc.c (gc_sweep): expand heap earlier.
|> reported by MORITA Naoyuki. [ruby-dev:28960]

|Any plans to back port it to ruby_1_8 branch?

Not for 1.8.5. Maybe later.

						matz.

Le 17 août 06, à 11:16, Francis C. a écrit :

This is a perfect example of what I’ve noticed many times: Ruby’s
performance is perfectly fast and acceptable until the working set
gets a
certain (not terribly large) size, then it falls off a cliff. GC
perhaps has
something to do with it, but I suspect that’s only a small part of the
problem.

Is there any tuning of GCC so it kicks in less frequently when the
memory consumption is large? Or could it be that the Hash algorithm
chokes when the number of keys gets large (like > 100_000)?

I guess I could re-implement in C or C++, but it saddens me because it
is a quick script for (probably) a one time task and doing string
processing is so much easier in Ruby. I’ll try in Perl first.

Before I spend a lot of time understanding your algorithm: is there a
specific reason why you need to keep the entire set in memory? You say
you’re generating summary output at the end of the run, but can you
accumulate your summary in an object of fixed size?

No, I can’t (at least I don’t see how). I have n reads split into ns
files and m contigs split into ms files. Then all ns read files are
match against all ms contig files, which leads to ns * ms match (or
delta) files. And each reads may have one or many match with any of the
contigs.

I want to extract for each read the 4 “best” matches. (the score is
defined as the length of the match minus 6 times the number of errors
in this match (we are talking about DNA alignment, so a match can have
a small number of mismatches, due to sequencing errors, mutations,
etc.)). The snps here are just added information, not crucial, but
probably adds to the problem because it increases the size of the data
set. It is the exact position of all the single mismatches inside of a
match. The number of line (one mismatch per line) is given by the
sub-header in the delta file.

So for every set of reads, I parse the ms delta files corresponding and
extract for the best 4 match for each read. The routing showed before
parses only one file. The same hash is used for all ms files.

Or does your summary depend on some kind of transformation on the
entire set
(like a numeric sort)? If so, there are things you can do to improve
the
algorithm so you’re not keeping a large working set in memory.

Maybe. I haven’t seen it yet. And I went first with the dumb algorithm
as it is an ad-hock script, not part of some great piece of software.

Thanks,
Guillaume.

On Thu, 17 Aug 2006 19:49:03 +0200, Guillaume M. [email protected]
wrote:

Is there any tuning of GCC so it kicks in less frequently when the
memory consumption is large? Or could it be that the Hash algorithm
chokes when the number of keys gets large (like > 100_000)?

If I remember a past thread correctly, the Ruby GC is set to keep memory
usage below 8 MB by default. This is specified is determined by a
#define
in the interpreter source code, and I don’t think it’s really an
adaptive
algorithm. If GC is the performance problem, you could try changing that
constant and rebuilding the interpreter.

As for the Hash problem, profiling could probably tell you that. Of
course, profiling with such a huge dataset when you already know the
code
gets anomalously slow might take really, really long.

David V.

In article [email protected],
Yukihiro M. [email protected] writes:

|Any plans to back port it to ruby_1_8 branch?

Not for 1.8.5. Maybe later.

Do you withdraw [ruby-dev:29004]?

On Thu, Aug 17, 2006 at 10:54:04PM +0900, Guillaume M. wrote:
[…]

When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.
[…]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?

Mauricio F. wrote:

Any clue on how to speed this up? Any help appreciated.
[…]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?

No. Any hint on what value I can raised it to? Any upper limit that I
should not cross?

Guillaume.

Hi,

In message “Re: Ruby, memory and speed”
on Fri, 18 Aug 2006 09:02:02 +0900, Tanaka A. [email protected]
writes:

|> |Any plans to back port it to ruby_1_8 branch?
|>
|> Not for 1.8.5. Maybe later.
|
|Do you withdraw [ruby-dev:29004]?

Ah, I am terribly sorry to forget. It’s checked in to 1.8.

						matz.

David V. wrote:

constant and rebuilding the interpreter.

As for the Hash problem, profiling could probably tell you that. Of
course, profiling with such a huge dataset when you already know the code
gets anomalously slow might take really, really long.

Yeah, that’s my problem. I did some profiling on some smaller subset,
and that’s why I removed the binary search (see posted code). A binary
search on a sorted of only 4 elements doesn’t buy you anything, it was
even slower. But I never tried with a dataset that would push the
memory consumption, sot I got no insight on that.

Guillaume.

Yukihiro M. wrote:

Ah, I am terribly sorry to forget. It’s checked in to 1.8.

  					matz.

Great. I’ll compile ruby from CVS and give it a try.

Guillaume.

On Thu, 17 Aug 2006, Robert K. wrote:

Small optimization, which will help only if delta_reorder is called ofen:

read = (reads[read_name.freeze] ||= [])

Background: a Hash will dup a non frozen string to avoid nasty effects if the
original changes.

Stole that for
http://rubygarden.org:3000/Ruby/page/show/RubyOptimization

Suggestions for the Original Poster…

  • Browse that Wiki page, it may have something for you. (Alternatively,
    once you solve your problem add the solution to that page!)

  • If you are on Linux, use “vmstat 5”
    eg.
    vmstat 5
    procs -----------memory---------- —swap-- -----io---- --system–
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us
    sy id wa
    0 0 127628 14308 4548 198096 1 1 28 7 27 20 8
    1 86 5
    0 0 127628 14308 4548 198096 0 0 0 0 421 835 0
    0 100 0

Watch the “si” and “so”. (Swap In Swap Out) If you are swapping 2 or
more swaps every 5 seconds, then you don’t have ruby GC problems, you
have memory problems. ie. Tweaking GC won’t help. You have to store less
in ram full stop. Remember to set any dangling references that you won’t
use again to nil, especially from class variables and globals.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.

Bill K. wrote:

architecture, besides GC, might possibly cause this?

Regards,

Bill

I’ve posted a script for building a “gprof-enabled” Ruby on GNU/Linux
systems at

http://rubyforge.org/cgi-bin/viewvc.cgi/MatrixBenchmark/?root=cougar

It’s a pretty simple process to dig into at least the CPU portion of the
equation – where is the Ruby interpreter spending cycles interpreting
your script? Some more sophisticated tools are required to profile
memory usage, but it’s more or less impossible for a normally-designed
program to abuse memory without spending CPU cycles doing it. :slight_smile:

Another simple thing you can do is run “top” while your Ruby script is
executing. On Windows, the Task Manager will do the same thing in the
“Process” tab. Sort in descending order on CPU and if your script is at
the “top”, it’s spending CPU time. Sort in descending order on memory
size and if your script is at the top without a lot of CPU usage, it’s a
memory management bottleneck.