Performance improvement possible?

People,

I have asked before about Ruby to C conversion programs and other
alternatives with no really satisfactory solution for my particular
situation. I decided to test out a conversion of one of the C programs
and see what sort of results I get. This particular small C program is
called 32,000 times from loops within a shell script. The program then
processes a text file, writes a text file and exits. The same shell
script with a Ruby program replacing the C program does exactly the same
thing but takes 8.5 times as long (27m/227m).

The profile on ONE execution of the Ruby program produced:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
65.18 14.15 14.15 1 14150.00 21700.00 Array#each
12.44 16.85 2.70 93792 0.03 0.03 Array#[]
8.15 18.62 1.77 62108 0.03 0.03 String#split
6.73 20.08 1.46 53215 0.03 0.03 String#==
2.58 20.64 0.56 66 8.48 12.73 Range#each

It would be SO much nicer to rewrite some stuff that needs rewriting and
write ALL new stuff in Ruby but this looks impossible with these times .
.

Any suggestions for performance improvements?

I am using F9.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

On Jun 24, 2008, at 12:23 PM, Philip R. wrote:

People,

I have asked before about Ruby to C conversion programs and other
alternatives with no really satisfactory solution for my particular
situation.

Are you looking at all solutions?

I decided to test out a conversion of one of the C programs and see
what sort of results I get. This particular small C program is
called 32,000 times from loops within a shell script. The program
then processes a text file, writes a text file and exits. The same
shell script with a Ruby program replacing the C program does
exactly the same thing but takes 8.5 times as long (27m/227m).

Can you replace the whole shell script with a ruby program? Then
whatever startup cost you have for the Ruby interpreter is paid once
rather than 32_000 times.

It would be SO much nicer to rewrite some stuff that needs rewriting
and write ALL new stuff in Ruby but this looks impossible with these
times . .

Any suggestions for performance improvements?

I am using F9.

F9? Is that some key in an IDE that runs your ruby code?

E-mail: [email protected]
Depending on what your C program has to do and why a shell script is
calling it in the first place, you might have other benefits from
using Ruby in place of the shell script. A couple years back, I
worked on a similar kind of project that was replacing shell scripts
with Perl and there were many benefits that Perl could exploit that
just could not be managed by constructs in the shell.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

On Jun 24, 2008, at 10:23 AM, Philip R. wrote:

Any suggestions for performance improvements?

post your code.

a @ http://codeforpeople.com/

On Jun 24, 2008, at 10:23 AM, Philip R. wrote:

227m).

It would be SO much nicer to rewrite some stuff that needs rewriting
and write ALL new stuff in Ruby but this looks impossible with these
times . .

Any suggestions for performance improvements?

The general approach I’ve followed is to write the code in Ruby, see
where the performance bottlenecks are, and then refactor those parts
of the code into a C-extension for Ruby.

Most recently, refactoring a regexp based string tokenizer into a
Ragel based C-extension gave about 13x performance improvement.

Add salt to taste.

Blessings,
TwP

On 24.06.2008 18:23, Philip R. wrote:

It would be SO much nicer to rewrite some stuff that needs rewriting and
write ALL new stuff in Ruby but this looks impossible with these times . .

Any suggestions for performance improvements?

Difficult without seeing the Ruby program. But I’d also start with
rewriting the script in Ruby so you do not have multiple processes.

Kind regards

robert

ara,

ara.t.howard wrote:

On Jun 24, 2008, at 10:23 AM, Philip R. wrote:

Any suggestions for performance improvements?

post your code.

I will convert the shell script as well (as others have suggested) and
then post something.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

Rob,

Rob B. wrote:

On Jun 24, 2008, at 12:23 PM, Philip R. wrote:

People,

I have asked before about Ruby to C conversion programs and other
alternatives with no really satisfactory solution for my particular
situation.

Are you looking at all solutions?

I guess I was mostly interested in the Ruby to C conversion program but
it is not really ready . .

I decided to test out a conversion of one of the C programs and see
what sort of results I get. This particular small C program is called
32,000 times from loops within a shell script. The program then
processes a text file, writes a text file and exits. The same shell
script with a Ruby program replacing the C program does exactly the
same thing but takes 8.5 times as long (27m/227m).

Can you replace the whole shell script with a ruby program? Then
whatever startup cost you have for the Ruby interpreter is paid once
rather than 32_000 times.

Yes, good point - I should have thought of that - the script is bigger
than the program but that is the next step I guess . .

It would be SO much nicer to rewrite some stuff that needs rewriting
and write ALL new stuff in Ruby but this looks impossible with these
times . .

Any suggestions for performance improvements?

I am using F9.

F9? Is that some key in an IDE that runs your ruby code?

Sorry, Fedora 9.

Depending on what your C program has to do and why a shell script is
calling it in the first place, you might have other benefits from using
Ruby in place of the shell script. A couple years back, I worked on a
similar kind of project that was replacing shell scripts with Perl and
there were many benefits that Perl could exploit that just could not be
managed by constructs in the shell.

I would be surprised if there was much improvement in the shell script
part itself but the reduced processes might have a big impact.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

Tim,

Tim P. wrote:

program then processes a text file, writes a text file and exits. The
6.73 20.08 1.46 53215 0.03 0.03 String#==
where the performance bottlenecks are, and then refactor those parts of
the code into a C-extension for Ruby.

I looked at that but it sort of defeats the purpose of rewriting in Ruby
. .

Most recently, refactoring a regexp based string tokenizer into a Ragel
based C-extension gave about 13x performance improvement.

Do you mean it was originally a Ruby regexp based string tokenizer?

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

Robert,

Robert K. wrote:

exactly the same thing but takes 8.5 times as long (27m/227m).

It would be SO much nicer to rewrite some stuff that needs rewriting
and write ALL new stuff in Ruby but this looks impossible with these
times . .

Any suggestions for performance improvements?

Difficult without seeing the Ruby program. But I’d also start with
rewriting the script in Ruby so you do not have multiple processes.

Yes, that seems like a good start.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

On Jun 24, 2008, at 11:41 AM, Philip R. wrote:

Tim,

Do you mean it was originally a Ruby regexp based string tokenizer?

Yes

pure ruby regexp => ragel / ruby c-extension

TwP

Rob,

Rob B. wrote:

what sort of results I get. This particular small C program is called

times . .

Ruby in place of the shell script. A couple years back, I worked on a
similar kind of project that was replacing shell scripts with Perl and
there were many benefits that Perl could exploit that just could not be
managed by constructs in the shell.

I tried:

  • Using v1.9: This was actually a little bit worse than the already 8.4
    times slower v1.8

  • Replacing the shell scripts as well and having one consolidated Ruby
    script: This improved the slowness from a factor of 8.4 to 7.4

  • Replacing the array in the Ruby script (I was reading a large file
    into an array for processing): This improved the slowness from a factor
    of 7.4 to 4.0

It would be nice if I could improve the Ruby script still further but
the profile now looks like:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
59.94 8.44 8.44 1 8440 14080 Object#statz
10.44 9.91 1.47 58318 0.03 0.03 Array#[]
8.1 11.05 1.14 26630 0.04 0.04 String#split
4.05 11.62 0.57 70 8.14 616.14 Range#each
4.05 12.19 0.57 17737 0.03 0.03 String#==
3.05 12.62 0.43 8872 0.05 0.05 IO#gets
2.98 13.04 0.42 8866 0.05 0.05 Fixnum#-
2.34 13.37 0.33 9737 0.03 0.03 Fixnum#+
2.06 13.66 0.29 8916 0.03 0.03 String#to_i
1.92 13.93 0.27 9955 0.03 0.03 Array#[]=

The change to a single Ruby script simplified the previous setup quite
markedly but I’m not sure that much more can be done with the relatively
small script now. I logged into “ruby Talk” on
http://codeforpeople.com/ but I couldn’t see anywhere to post the code.
I can mail the files to anyone who is interested in having a look.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

Chuck,

Chuck R. wrote:

Phil,

use one of the pastie sites to paste up your code for us to see.

http://pastie.org
http://rafb.net

Done - http://pastie.org/222306

I am also rerunning with v1.9 and now that the code is all in one
program it seems to be faster (but still about 3x slower than the
equivalent shell scripts and C/C++ program).

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

On Jun 25, 2008, at 5:13 PM, Philip R. wrote:

The change to a single Ruby script simplified the previous setup
quite markedly but I’m not sure that much more can be done with the
relatively small script now. I logged into “ruby Talk” on http://codeforpeople.com/
but I couldn’t see anywhere to post the code. I can mail the files
to anyone who is interested in having a look.

Phil,

use one of the pastie sites to paste up your code for us to see.

http://pastie.org
http://rafb.net

cr

On Jun 25, 2008, at 8:44 PM, Philip R. wrote:

Phil,
Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

OK, even knowing that it started as shell+C, it needs help :wink:

http://pastie.org/222375

Except for noticing that the Dir.chdir is still commented out, I think
that this is equivalent to your original. Of course, all that really
means it that I think it produces the same output. I don’t have your
input files to check (and I’m not really sure I have a clue what it is
really doing either).

If you can directly try this new set, I’d be curious as to the
relative speed. A few important notes:

  • the $darr was never referenced so why bother to initialize it
  • the multiple loops in stats06 have been collapsed
  • the various seed* loops now use strings directly
  • the parameters to statz32000 are strings rather that do somewhat
    pointless conversions (and sample_sz of 999 would ‘overflow’ the %02d
    into “999” anyway). Some potential values won’t work as literals (try
    08 as an example, or 012) since the leading 0 on a literal means octal.
  • the more common boolean values are true (rather than TRUE) and false
    and parentheses are not generally needed around the conditional
    expression of an ‘if’

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

Rob,

Rob B. wrote:

to post the code. I can mail the files to anyone who is
Thanks,

Phil. – Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275) GPO Box
3411 Sydney NSW 2001 Australia E-mail: [email protected]

OK, even knowing that it started as shell+C, it needs help :wink:

That’s why I’m here!

http://pastie.org/222375

Except for noticing that the Dir.chdir is still commented out, I
think that this is equivalent to your original. Of course, all that
really means it that I think it produces the same output. I don’t
have your input files to check (and I’m not really sure I have a clue
what it is really doing either).

The output seems good.

If you can directly try this new set, I’d be curious as to the
relative speed. A few important notes: * the $darr was never
referenced so why bother to initialize it

That is for future reference (other stats).

  • the multiple loops in stats06 have been collapsed

Thanks - nice!

  • the various seed* loops now use strings directly

Much nicer!

  • the parameters to statz32000 are strings rather that do somewhat
    pointless conversions (and sample_sz of 999 would ‘overflow’ the %02d
    into “999” anyway).

In the originals I needed the leading zeros as strings for input and
output file name consistency.

Some potential values won’t work as literals (try 08 as an example,
or 012) since the leading 0 on a literal means octal.

That could be a problem . .

  • the more
    common boolean values are true (rather than TRUE) and false

OK.

and
parentheses are not generally needed around the conditional
expression of an ‘if’

I guess I use them so that they are there for future, more complex
expressions.

The performance improvement has been fairly dramatic! - the previous
version using Ruby v1.9 took 3.44 times as long as the original sh/C
stuff - this version is only takes 1.85 times as long! - an improvement
of 46%! However I am not sure what the reason is for this improvement -
though the changes certainly improve the readability and “Rubyness” of
the code - most of the changes seem cosmetic to me . .

Here is the profile (using v1.8) for pre and post your changes:

Ruby v1.8
v1.5.1
% cumulative self self total
time seconds seconds calls ms/call ms/call name
59.94 8.44 8.44 1 8440 14080 Object#statz
10.44 9.91 1.47 58318 0.03 0.03 Array#[]
8.1 11.05 1.14 26630 0.04 0.04 String#split
4.05 11.62 0.57 70 8.14 616.14 Range#each
4.05 12.19 0.57 17737 0.03 0.03 String#==
3.05 12.62 0.43 8872 0.05 0.05 IO#gets
2.98 13.04 0.42 8866 0.05 0.05 Fixnum#-
2.34 13.37 0.33 9737 0.03 0.03 Fixnum#+
2.06 13.66 0.29 8916 0.03 0.03 String#to_i
1.92 13.93 0.27 9955 0.03 0.03 Array#[]=
0.36 13.98 0.05 880 0.06 0.06 Fixnum#>
0.21 14.01 0.03 22 1.36 42.27 Object#stats06
0.21 14.04 0.03 274 0.11 0.26 Array#initialize
0.14 14.06 0.02 440 0.05 0.05 Fixnum#==
0.07 14.07 0.01 274 0.04 0.29 Class#new
0.07 14.08 0.01 22 0.45 0.45 IO#puts

Ruby v1.8
v1.5.2
% cumulative self self total
time seconds seconds calls ms/call ms/call
60.36 7.02 7.02 2 3510 11625 IO#open
9.46 8.12 1.1 29128 0.04 0.04 Array#[]
5.25 8.73 0.61 17737 0.03 0.03 String#==
3.96 9.19 0.46 8871 0.05 0.05 String#split
3.87 9.64 0.45 8872 0.05 0.05 IO#gets
3.87 10.09 0.45 22 20.45 27.73 Integer#times
3.35 10.48 0.39 8800 0.04 0.04 Fixnum#-
3.18 10.85 0.37 9737 0.04 0.04 Fixnum#+
3.1 11.21 0.36 9071 0.04 0.04 Array#[]=
2.75 11.53 0.32 8915 0.04 0.04 String#to_i
0.34 11.57 0.04 880 0.05 0.05 Fixnum#>
0.17 11.59 0.02 115 0.17 0.35 Array#initialize
0.09 11.6 0.01 22 0.45 0.45 Kernel.===
0.09 11.61 0.01 44 0.23 0.23 Array#initialize_copy
0.09 11.62 0.01 440 0.02 0.02 Fixnum#==
0.09 11.63 0.01 44 0.23 0.45 Kernel.dup

Any comments on reasons for the improvement?

Many thanks,

Regards,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

On Jun 26, 2008, at 12:08 PM, Philip R. wrote:

to post the code. I can mail the files to anyone who is
3411 Sydney NSW 2001 Australia E-mail: [email protected]

Much nicer!

Ruby v1.8
2.34 13.37 0.33 9737 0.03 0.03 Fixnum#+
v1.5.2
3.1 11.21 0.36 9071 0.04 0.04 Array#[]=
Many thanks,

Regards,

Phil.

Philip R.
E-mail: [email protected]

Well, let’s rearrange that so the differences are more apparent:

Ruby v1.8
v1.5.2 v1.5.1
self self
calls name calls
2 IO#open
29128 Array#[] 58318
17737 String#== 17737
8871 String#split 26630
8872 IO#gets 8872
22 Integer#times
8800 Fixnum#- 8866
9737 Fixnum#+ 9737
9071 Array#[]= 9955
8915 String#to_i 8916
880 Fixnum#> 880
115 Array#initialize 274
22 Kernel.===
44 Array#initialize_copy
440 Fixnum#== 440
44 Kernel.dup
Object#statz 1
Range#each 70
Object#stats06 22
Class#new 274
IO#puts 22

Now it’s easier to see that the big differences are the calls to
String#split (3x) and Array#[] (2x) in the older version. Array#[]=
has about 10% more, but we’ll ignore that for now. I’d guess that the
Kernel.dup and Array#initialize_copy are much more efficient at
creating the alleles arrays than the former element-at-a-time method.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

On 26 Jun 2008, at 04:24, Rob B. wrote:

Phil,
OK, even knowing that it started as shell+C, it needs help :wink:
relative speed. A few important notes:
conditional expression of an ‘if’
A further modified version of statz which should changes the way IO is
handled:

http://pastie.org/222765

Let me know if this makes any difference.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

Eleanor,

Eleanor McHugh wrote:

code. I can mail the files to anyone who is interested in having a
equivalent shell scripts and C/C++ program).

and parentheses are not generally needed around the conditional
expression of an ‘if’

A further modified version of statz which should changes the way IO is
handled:

http://pastie.org/222765

Let me know if this makes any difference.

A few things:

  • you left a line in the loop:

    File.open( output_filename, ‘w’ ) do |fout|

which should be deleted

  • I originally used:

    stats = []
    lines = File.readlines(input_filename, ‘r’)

but found that reading the whole file (8871 lines) and then processing
the array was inefficient so I got rid of the array

  • using:

    stats << stats06

and the file writing output of:

     File.open(output_filename, "a") do |file|

file.sync = false
file.puts *stats
file.fsync

end

looks interesting - why should that be faster?

Thanks again,

Regards,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

On 26 Jun 2008, at 20:47, Philip R. wrote:

A few things:

  • you left a line in the loop:

    File.open( output_filename, ‘w’ ) do |fout|

which should be deleted

Paste in haste, repent at leisure :wink:
I’ve corrected it to read the way it appeared in my head when I was
looking at it: http://pastie.org/222765

stats << stats06
If you buffer it as a single read and then work through the file in
memory it guarantees that you minimise the IO costs of reading. I am
of course assuming that even at 8871 lines your file is much smaller
than your available RAM :slight_smile:

and the file writing output of:

  File.open(output_filename, "a") do |file|
file.sync = false
file.puts *stats
file.fsync

end

looks interesting - why should that be faster?

Doing the file write this way offloads making it efficient to the Ruby
runtime.
The file.fsync call will cost you in terms of runtime performance, but
it ensures that the data is flushed to disk before moving on to the
next file which for a large data processing job is often desirable.
Personally I wouldn’t store the results in separate files but combine
them into a single file (possibly even a database), however I don’t
know how that would fit with your use case.

As to the file.puts *stats, there’s no guarantee this approach will be
efficient but compared to doing something like:

File.open(output_filename, “a”) do |file|
stats.each { |stat| file.puts stat }
end

it feels more natural to the problem domain.

Another alternative would be:

File.open(output_filename, “a”) do |file|
file.puts stats.join(“\n”)
end

but that’s likely to use more memory as first an in-memory string will
be created, then this will be passed to Ruby’s IO code. For the size
of file you’re working with that’s not likely to be a problem.

I’ve a suspicion that your overall algorithm can also be greatly
improved. In particular the fact that you’re forming a cubic array and
then manipulating it raises warning bells and suggests you’ll have
data sparsity issues which could be handled in a different way, but
that would require a deeper understanding of your data.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

Ellie,

Eleanor McHugh wrote:

way it appeared in my head when I was looking at it:

  • using:

stats << stats06

If you buffer it as a single read and then work through the file in
memory it guarantees that you minimise the IO costs of reading. I am
of course assuming that even at 8871 lines your file is much smaller
than your available RAM :slight_smile:

When I did the profile, the array processing was the biggest hit - when
I got rid of the array, I almost halved the time! Ruby arrays are
pretty cool but I think you pay for the convenience . .

moving on to the next file which for a large data processing job is
often desirable.

See my other note but it didn’t make much difference.

Personally I wouldn’t store the results in separate
files but combine them into a single file (possibly even a database),
however I don’t know how that would fit with your use case.

There is more post processing using R and for casual inspection it is
convenient to be able to find data according to it’s file name. It
might still be possible to have fewer, larger files - I might ask
another question about that (basically I have paste the single column
output of this stuff into 32 column arrays). I have tried DBs for
storing output form the main simulation program when it was all in C/C++
and it was quite slow so I went back to text files . .

As to the file.puts *stats, there’s no guarantee this approach will
be efficient but compared to doing something like:

File.open(output_filename, “a”) do |file| stats.each { |stat|
file.puts stat } end

it feels more natural to the problem domain.

Yes, it is was good to find out about this alternative.

improved.
I’m sure you are right about that!

In particular the fact that you’re forming a cubic array and then
manipulating it raises warning bells and suggests you’ll have data
sparsity issues which could be handled in a different way, but that
would require a deeper understanding of your data.

The cubic array was just a direct translation of the C pointer setup I
had - basically it is a rectangular grid of sub-populations each with an
array of allele lengths.

Thanks again,

Regards,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]