Forum: Ruby-core A Proposal to Improve the Method String.concat

Posted by Martin Vahi (martin_v)
on 2012-06-18 15:16
(Received via mailing list)
I'm sorry, if I'm double-posting or even triple-posting.
It seems that one has to perform mail based registration
before the posts from the http://www.ruby-forum.com/ reach the list.

What regards to performance optimizations, then
I have Ruby and PHP demo scripts at

https://github.com/martinvahi/mmmv_notes/tree/mast...

that demonstrates at least 20% performance difference
between plain string concatenation algorithm and
the versions that can be downloaded from there.

Download command:

git clone https://github.com/martinvahi/mmmv_notes.git
# then on Linux/BSD
cd mmmv_notes/mmmv_notes/phenomenon_scrutinization/string_concatenation

The Linux/BSD specific performance comparison scripts are
also there, so people can easily verify my claims
themselves, modify test parameters, etc.

Given that a lot of applications, specially in the web era,
concatenate a lot of strings, generate JSON, YAML, etc,
the fundamental, programming language independent,
performance issues are worth considering.

At the API side, the String.concat might be modified to
accept arrays of strings and an arbitrary number of
string/array arguments. The initial improvements do not
need any C programming, because the performance gains
can be attained by embedding the Ruby code to the String.concat.

Direct link to the proposed string concatenation implementation:

https://raw.github.com/martinvahi/mmmv_notes/maste...

Thank You for reading this letter.
Martin.Vahi@softf1.com
Posted by Benoit Daloze (Guest)
on 2012-06-18 17:55
(Received via mailing list)
Hello,

On 18 June 2012 15:16, Martin Vahi <martin.vahi@softf1.com> wrote:
> that demonstrates at least 20% performance difference
> between plain string concatenation algorithm and
> the versions that can be downloaded from there.

I believe this is not worth it.
It seems you are not aware of String#<<, and that Ruby has mutable 
strings.
Which means there is no need for optimized repeated String#+, one can
just #dup the original (if needed) and use #<<(aliased to #concat)
afterwards.

In numbers:

n = 100000
String#concat: 0.11s user 0.01s system 97% cpu 0.115 total
watershed concatenation: 0.18s user 0.03s system 97% cpu 0.215 total
pseudo-watershed concatenation: 4.00s user 2.98s system 98% cpu 7.083 
total
plain loop concatenation: 7.33s user 7.54s system 98% cpu 15.081 total

n = 1000000,
String#concat: 0.88s user 0.01s system 99% cpu 0.896 total
watershed concatenation: 2.04s user 0.43s system 99% cpu 2.471 total

But maybe I missed something.
Posted by Martin Vahi (martin_v)
on 2012-06-18 20:55
Posted by Benoit Daloze (Guest) on 2012-06-18 17:55
> ...
> Which means there is no need for optimized repeated String#+, one can
> just #dup the original (if needed) and use #<<(aliased to #concat)
> afterwards.
>
> In numbers:
> ...
> n = 1000000,
> String#concat: 0.88s user 0.01s system 99% cpu 0.896 total
> watershed concatenation: 2.04s user 0.43s system 99% cpu 2.471 total
> ...

Thank You (Benoit Daloze and possibly others) for the answer.

I improved the Ruby version of the tests set
according to Your (Benoit Daloze) observations.
https://github.com/martinvahi/mmmv_notes/tree/mast...

The measurement results of the version 2 of the tests set:
n=150k, CPU equiv {3,16GHz, 1MiB cache per core}

Ruby test with the plain loop concatenation, version 1.
real    0m14.694s
user    0m13.797s
sys     0m0.844s

Ruby test with the watershed concatenation, version 1
real    0m0.418s
user    0m0.316s
sys     0m0.028s


Ruby test with the watershed concatenation,  version 2 (improved
according to the comments of Benoit Daloze).
real    0m0.403s
user    0m0.320s
sys     0m0.008s


Ruby test with the plain loop concatenation, version 2 (improved
according to the comments of Benoit Daloze).
real    0m0.287s
user    0m0.192s
sys     0m0.016s

I conclude that You (Benoit Daloze) were right:
there's no point of improving the
implementation of the String.concat .

Unfortunately one can still improve the documentation of
the String.+ to save others from what I have just done.

Thank You for the feedback. I found it very helpful.
Martin.Vahi@softf1.com
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.