Le 18 août 06, à 00:07, John C. a écrit :
Stole that for
http://rubygarden.org:3000/Ruby/page/show/RubyOptimizationSuggestions for the Original Poster…
- Browse that Wiki page, it may have something for you. (Alternatively,
once you solve your problem add the solution to that page!)
Thanks for the pointer.
I also used Mmap#scan. It is pretty elegant compare to the usual:
io.each { |l|
l.chomp!
next unless l =? /blah/
…
}
I would think it is faster too (no formal testing done).
Watch the “si” and “so”. (Swap In Swap Out) If you are swapping 2 or
more swaps every 5 seconds, then you don’t have ruby GC problems, you
have memory problems. ie. Tweaking GC won’t help. You have to store
less
in ram full stop. Remember to set any dangling references that you
won’t
use again to nil, especially from class variables and globals.
Checked that. But no, the machine has plenty of memory and si/so was
always 0.
I finally went back to doing stream parsing. Instead of aggregating the
information from many file in one big hash, I read and write one record
at a time in a format suitable for sort (the UNIX command). Then pipe
it to sort. Finally, I merge all relevant files together with ‘sort
-m’. This lead to the result in a few hours only.
Thank you all for your suggestions,
Guillaume.