Forum: Ruby Looking for information on string processing performance

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
569dc399cc92c579c2c4b11dbd08cdf5?d=identicon&s=25 Tuwewe (Guest)
on 2006-01-24 17:43
(Received via mailing list)
Hi,

complete newbie here. I love Ruby and would like to use Ruby in my new
project, which deals heavily with large sized file manipulation and
string processing.

Can you give me advice on how to write decent (with an emphasis on
performance) string processing scripts.

Things I am interested in are best practices for concating strings,
searching for sub-string, regex operations, reading and writing files,
etc.

Also links to Ruby performance related sites are greatly appreciated.

Thanks in advance.
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 17:43
(Received via mailing list)
Tuwewe wrote:
> searching for sub-string, regex operations, reading and writing files,
> etc.
>
> Also links to Ruby performance related sites are greatly appreciated.
>
> Thanks in advance.

Some general remark:

 - keep the number of created objects as small as possible

 - cling to objects only as long as they are needed

 - if possible freeze strings that you use as hash keys (this avoids the
overhead of a new string instance being created as key)

 - use a_string << other_string rather than a_string += other_string -
or
use StringIO.

 - where possible use in place replacements (sub! and gsub! instead of
sub
and gsub)

 - when processing large files process them streaming mode if possible
instead of slurping them into mem as a whole

 - Start with default IO methods and rely on their buffering and line
end
parsing before switching to more complex scenarios (sysread, syswrite).

To give more precise info we would need to know more about your
application case.  And when it comes to optimization you'll have to
measure your app anyway.  Tools that can help there are "ruby -r
profile"
and module Benchmark.

Kind regards

    robert
569dc399cc92c579c2c4b11dbd08cdf5?d=identicon&s=25 Tuwewe (Guest)
on 2006-01-24 17:52
(Received via mailing list)
Thank you, Robert. That's great advice! I will keep them in mind while
learning Ruby. If files become ridiculously large, there is always
c/c++ there. But I really want to do as much work in Ruby as possible.

I tried in place replacement and the use of << in stead of +, and my
little test scripts run much faster (300% better).

Could you elaborate  on "when processing large files process them
streaming mode"? Do you mean using iterator such as
IO.foreach("testfile") { |line| ........ }
in stead of the way like
arr = IO.readlines("testfile") ? Or is there another library I should
look into?

Thanks again for your help.
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 17:55
(Received via mailing list)
Tuwewe wrote:
> Thank you, Robert. That's great advice! I will keep them in mind while
> learning Ruby. If files become ridiculously large, there is always
> c/c++ there. But I really want to do as much work in Ruby as possible.
>
> I tried in place replacement and the use of << in stead of +, and my
> little test scripts run much faster (300% better).

:-)

> Could you elaborate  on "when processing large files process them
> streaming mode"? Do you mean using iterator such as
> IO.foreach("testfile") { |line| ........ }
> in stead of the way like
> arr = IO.readlines("testfile") ? Or is there another library I should
> look into?

No, that's exactly what I meant: If possible do it line by line instead
of
slurping in the whole file and then doing the work.  (Of course there
are
libs out there for specialized tasks, e.g. CSV parsing, but that's a
different story.)

> Thanks again for your help.

You're welcome!

Kind regards

    robert
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-01-24 19:33
(Received via mailing list)
"Robert Klemme" <bob.news@gmx.net> writes:

> Tuwewe wrote:
>> Can you give me advice on how to write decent (with an emphasis on
>> performance) string processing scripts.
>>
>> Things I am interested in are best practices for concating strings,
>> searching for sub-string, regex operations, reading and writing files,
>> etc.
[great hints snipped]

Also, consider using ruby-mmap if appropriate.
This topic is locked and can not be replied to.