For performance, write it in C

On Sat, Jul 29, 2006 at 07:13:11AM +0900, Keith G. wrote:

On Sat, Jul 29, 2006 at 06:59:41AM +0900, Chad P. wrote:

That wasn’t a passive-aggressive remark, it was a joking comment about
the inequity of your comparison (intentional or otherwise).

It didn’t come across as joking.

I’d be inclined to apologize for the misunderstanding if you hadn’t
decided I was the devil incarnate for a mis-taken joke.

You’re welcome to your misconceptions and bad attitudes, though.

Ditto. I wan’t the one who wrote “Y’know, screw it. Be an ass if you
like.” I hadn’t even considered flipping the bozo bit until I read that.

Reread what you said in the preceding posts and tell me if you wouldn’t
have the same reaction to someone flying off the damned handle at a
stupid joke. I tried to inject levity because I could see the
conversation heading in a bad direction, and you were so intent on
seeing me in a bad light that it didn’t occur to you to assume good
faith on my part. Congratulations.

Chad, do you mind if we take this off list? There’s no sense in either
of us cluttering up the list with an offtopic discussion.

K.

On Sat, Jul 29, 2006 at 08:47:44AM +0900, Keith G. wrote:

Chad, do you mind if we take this off list? There’s no sense in either
of us cluttering up the list with an offtopic discussion.

go for it

On Jul 28, 2006, at 2:34 AM, Pit C. wrote:

gave me no usable hint among the first 50 results besides using
cygwin. Do you have a link to the reports you mention?

Maybe I should have written that giving that I’m using the One
Click Installer, don’t have the Windows compiler toolchain, and am
not willing to use cygwin, I can’t use Ruby Inline. Is this better?

It should work with the 1-click installer, but yeah… not without a
compiler. so there is no way to even expect it to work.

But, instead I might try to use Ruby Inline with MinGW, so thanks
for the question.

I’ve not used mingw, and have no idea if others have.

On 7/28/06, M. Edward (Ed) Borasky [email protected] wrote:

So what is the source of the reluctance to use CygWin in the Ruby community?

It’s crap and doesn’t mesh well with Windows itself.

I have even recently (mostly) dumped it in favour of xming, because I
was only using cygwin for X services.

-austin

Austin Z. wrote:

On 7/28/06, M. Edward (Ed) Borasky [email protected] wrote:

So what is the source of the reluctance to use CygWin in the Ruby
community?

It’s crap and doesn’t mesh well with Windows itself.

I have even recently (mostly) dumped it in favour of xming, because I
was only using cygwin for X services.

-austin
Ah … I only use the client-side stuff, not the server “emulations”. I
use the CygWin Perl extensively, plus the command line and the X server.
Once in a while, I need an open source piece of software that doesn’t
have a native Windows build … at that point my first attempt is to
compile and run it under CygWin. If that fails, I have a Gentoo Linux
VMware virtual machine I use.

I suppose I should try Xming … as long as it has a usable command
line, I can get to ActiveState Perl.

On Thu, Jul 27, 2006 at 09:26:45AM +0900, David P. wrote:

Is that heavily optimized Java vs. “normal” (untweaked) C?
sometimes billions) of Dollars, Euros, etc. through their spreadsheets
every day. A 5 or 10 second advantage in calculating a spreadsheet
could mean a significant profit for a trading firm.

So, I am comparing apples to apples. A Java program can be optimized
to perform as well as a C program for certain tasks.

I/O bound tasks? Certain artificial benchmarks? Well yes, ruby can
perform as well as C on those too, but that’s hardly the point
here. This thread’s topic is not if a few Java programs can be made
as fast as C.

The point is, “write it in C” is a valid general advice, but
“write it in Java” depends on a lot of factors noone of the Java crowd
mentions beforehand, but wants to be taken into account afterwards
when comparing performance. You can’t have it both ways.

**

Anyhow, I doubt the apples to apples and JIT argument. You can write
self modifying C code to optimize at run time too. Now it is a bit far
fetched to write a full JIT compiler in C just for your project. But,
if there is opportunity for dynamic optimization at run time which the
Java JIT compiler can take advantage of, there has to be an
opportunity for run time optimization in C too. I don’t say this is
easy to spot or implement, just possible.

After all, Java VMs are still written in C, aren’t they?

-Jürgen

Ok, I need to preface this by saying that I’m in no way either a C or
ruby guru, so I may have missed something here, but this is what I’m
seeing. The bottle-neck here is in the printing to screen.

I don’t see how the OP got the code to run in 5 seconds while using any
stdout writing function (e.g., printf). The program should print like 4
megs of text to the screen (though I couldn’t get the C that was posted
to compile–like I said, I’m no C superstar) – there’s no way that is
happening in 5 seconds.

Taking a very simple test case, which just prints a 10 byte char array
15000 times using puts:


#include <stdio.h>
#define SIZE 15000
int main() {
char str[11] = “ABCDEFGHIJ”;
int i;
for (i = 0; i < SIZE; ++i) {
puts(str);
}
printf(“Printed %i bytes of data\n\0”, SIZE*strlen(str));
return 0;
}


Compiled with:

gcc -o test test.c

This yields:

time ./test

ABCDEFGHIJ
ABCDEFGHIJ

ABCDEFGHIJ
Printed 150000 bytes of data

real 0m25.621s
user 0m0.010s
sys 0m0.029s

Now, let’s see how Ruby’s stdout writing stacks up:


#!/usr/bin/ruby -w
SIZE = 15000
str = “ABCDEFGHIJ”
1.upto(SIZE) {
STDOUT.syswrite("#{str}\n")
}
STDOUT.syswrite(“Printed #{SIZE*str.length} bytes of data\n”)


This yields:

ABCDEFGHIJ
ABCDEFGHIJ

ABCDEFGHIJ
Printed 150000 bytes of data

real 0m26.796s
user 0m0.202s
sys 0m0.049s

Pretty comparable there.

Of course, the bottle-neck was supposedly the maths and array access
and such, which is where C would excel (I’m not denying that ruby is
sometimes pretty slow, or that C is much faster all around, just bare
with me here).

So with one ruby implementation of the program mentioned on here:


#!/usr/bin/ruby -w

Wd = (ARGV.shift || 5).to_i
$board = []

Generate all possible valid rows.

Rows = (0…Wd ** Wd)
.map { |n| n.to_s(Wd)
.rjust(Wd,‘0’) }
.reject{ |s| s =~ /(.).*\1/ }
.map { |s| s.split(//)
.map { |n| n.to_i + 1 }
}

def check (ary, n)
ary[0, n + 1].transpose.all? { |x| x.size == x.uniq.size }
end

def add_a_row (row_num)
if (Wd == row_num)
STDOUT.syswrite($board.map { |row| row.join }.join(’:’))
else
Rows.size.times { |i|
$board[row_num] = Rows[i]
if (check($board, row_num))
add_a_row(row_num + 1)
end
}
end
end

add_a_row(0)


This took like 48 minutes! Ouch! But if that syswrite (or puts in the
original version) is replaced with a file write:


def add_a_row (row_num)
if Wd == row_num
$outfile << $board.map { |row| row.join }.join(’:’)
else
Rows.size.times { |i|
$board[row_num] = Rows[i]
if (check($board, row_num))
add_a_row(row_num + 1)
end
}
end
end

$outfile = File.open(‘latins.dump’, ‘wb’)
add_a_row(0)
$outfile.close


Now we’re down to 16 minutes. Much better! But still…

Ok, so what about the implementation using the permutation and set
classes?


#!/usr/bin/ruby -w

require(‘permutation’)
require(‘set’)

$size = (ARGV.shift || 5).to_i

$perms = Permutation.new($size).map { |p| p.value }
$out = $perms.map { |p| p.map { |v| v+1 }.join }
$filter = $perms.map { |p|
s = SortedSet.new
$perms.each_with_index { |o, i|
o.each_with_index { |v, j| s.add(i) if p[j] == v }
} && s.to_a
}

$latins = []
def search lines, possibs
return $latins << lines if lines.size == $size
possibs.each { |p|
search lines + [p], (possibs -
$filter[p]).subtract(lines.last.to_i…p)
}
end

search [], SortedSet[*(0…$perms.size)]

$latins.each { |latin|
$perms.each { |perm|
perm.each { |p| STDOUT.syswrite($out[latin[p]] + “\n”) }
STDOUT.syswrite("\n")
}
}


26 minutes…that’s still gonna leave a mark. But the file IO version
of the same program?


outfile = File.open(‘latins.dump’, ‘wb’)
$latins.each { |latin|
$perms.each { |perm|
perm.each { |p| outfile << $out[latin[p]] << “\n” }
outfile << “\n”
}
}
outfile.close


time ruby test.rb

real 0m17.227s
user 0m13.969s
sys 0m1.399s

17 seconds! WOOHOO! Yes, indeedy. That’s more like it. Try it if you
don’t believe me.

So the moral of the story is twofold:

1.) Don’t assume the bottle-neck is in the interpreter and just run off
and start writing everything in C or Java or simplified Klingon or
whatever. Testing and coverage and profiling is the key to discovering
the true cause of your woes. Here the bottle-neck was in the writing 4+
megs to stdout – and going to C won’t help for that, contra the claims
of the OP.

2.) Don’t write a crapload of text to stdout. It won’t be as fast as
you’d like it to be no matter what language you use.

========

NB: All testing was done on:

Linux 2.6.17-gentoo-r5 #2 PREEMPT Thu Aug 10 14:21:37 CDT 2006 i686 AMD
Athlon™ XP 1600+ AuthenticAMD GNU/Linux

gcc version 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.9)

GNU C Library development release version 2.4

ruby 1.8.5 (2006-08-18) [i686-linux]

========

Regards,
Jordan

On Mon, Aug 21, 2006 at 03:55:10PM +0900, [email protected] wrote:

Ok, I need to preface this by saying that I’m in no way either a C or
ruby guru, so I may have missed something here, but this is what I’m
seeing. The bottle-neck here is in the printing to screen.

I don’t see how the OP got the code to run in 5 seconds while using any
stdout writing function (e.g., printf). The program should print like 4
megs of text to the screen (though I couldn’t get the C that was posted
to compile–like I said, I’m no C superstar) – there’s no way that is
happening in 5 seconds.

First, stdout doesn’t neccesarily write to your screen, and /dev/null
is very very fast. Even if writing to some kind of terminal emulator,
some of them are much faster with bulk output than others.

int i;

Printed 150000 bytes of data
#!/usr/bin/ruby -w

Pretty comparable there.

Of course, the bottle-neck was supposedly the maths and array access
and such, which is where C would excel (I’m not denying that ruby is
sometimes pretty slow, or that C is much faster all around, just bare
with me here).

Not so. Let’s look at the user numbers which represent CPU time used
by the programs themselves. The difference between 0.010 and 0.202s is
pretty large (2000%), but maybe the values are still too small to
disregard startup time etc.

99.999% of the real execution time your program was not even running,
but waiting, presumably for a slow output system to write your strings.

All you did benchmark was that your terminal emulator (or whereever
your stdout goes to) is slow.

I suggest to increase SIZE and redirect stdout to the bitbucket (or a
file if you lack a decent terminal) and want to compare only the C and
ruby versions again. The difference will be much more visible even in
real execution times then.

Jürgen

Well put. I totally agree with you.

I’d like to mention that you can use inline for this purpose.
Have a look at

In the article, ruby and inlined c version of prime nubmer checking is
compared. As expected, inlined C runs very fast compared to ruby only
version.

I also have an experience of porting to C++. My application
which heavily relied on matrix and log functions took more than
2 days to finish. After I’ve ported it to C++, it took about 6hrs.
Yes, it was amazing performance boost.

Sincerely,
Minkoo S.

Jürgen Strobel wrote:

On Mon, Aug 21, 2006 at 03:55:10PM +0900, [email protected] wrote:

Ok, I need to preface this by saying that I’m in no way either a C or
ruby guru, so I may have missed something here, but this is what I’m
seeing. The bottle-neck here is in the printing to screen.

I don’t see how the OP got the code to run in 5 seconds while using any
stdout writing function (e.g., printf). The program should print like 4
megs of text to the screen (though I couldn’t get the C that was posted
to compile–like I said, I’m no C superstar) – there’s no way that is
happening in 5 seconds.

First, stdout doesn’t neccesarily write to your screen, and /dev/null
is very very fast. Even if writing to some kind of terminal emulator,
some of them are much faster with bulk output than others.

Ok, fair enough. I made the assumption that the testing was done on a
terminal emulator since the perl execution time was given as 12 minutes
and the faster version of the ruby algorithm writing to a file rather
than stdout took only 17 seconds, whereas it took 26 minutes writing to
stdout–I assumed that the perl version would be comparable with file
writing rather than writing to standard out. I used xterm.

All you did benchmark was that your terminal emulator (or whereever
your stdout goes to) is slow.

Yes! That’s exactly what I was trying to do! I was trying to show that
writing to stdout (assuming a comparable context, like an xterm) is the
bottle-neck of the ruby version here. Writing to a file was about 90
times faster. Like I said, I wasn’t denying that C is faster; my point
was this: the C version may only take 5 seconds to write to a file,
but ruby only takes 17 seconds; and for me, as I’m sure for others, 12
seconds is not worth the effort of writing it in C. So the 12 minutes to
5 seconds comparison was flawed – 17 seconds to 5 seconds is more
accurate.

Regards,
Jordan

=?iso-8859-1?Q?J=FCrgen?= Strobel wrote:

On Sat, Aug 26, 2006 at 11:23:02PM +0900, Jordan Callicoat wrote:

Jürgen Strobel wrote:

Yes! That’s exactly what I was trying to do! I was trying to show that
writing to stdout (assuming a comparable context, like an xterm) is the
bottle-neck of the ruby version here. Writing to a file was about 90
times faster. Like I said, I wasn’t denying that C is faster; my point
was this: the C version may only take 5 seconds to write to a file,
but ruby only takes 17 seconds; and for me, as I’m sure for others, 12
seconds is not worth the effort of writing it in C. So the 12 minutes to
5 seconds comparison was flawed – 17 seconds to 5 seconds is more
accurate.

Usually you don’t benchmark with external arbitrary bottlenecks
enabled. The point is, your xterm may be of very different speed
than mine. Generic terminal emulators are not a comparable context at
all, not without benchmarking them themselves, which is not the point
of this exercise.

I should have emphasized your terminal emulator. We are comparing
ruby vs. C, so it makes sense to assume a fast and uniform output
mechanism which doesn’t get in our way. It is standard practice to
write to stdout, but redirect it to /dev/null or files if writeing
larger quantities in benchmarks.

To get real numbers, your C version took 0.010s CPU time, and your
ruby one took 0.202s. A difference of roughly 2000% may well be
important to several applications. Sometimes it may be completely
irrelevant, for instance if CPU time is dwarfed by I/O.

-Jürgen

I understand your point, but I think we may be talking past each other
here. My purpose was to show that the ruby (and assumedly perl)
bottle-neck was due other factors than the interpreter. Granted, ruby is
slower than C – even a lot slower (2000%!) – but the end result of
writing to a file is 17 seconds in ruby. That is light years removed
from 26 (or even 12) minutes (i.e., writing to stdout with no
redirection). So while 12 minutes versus 5 seconds may make you want to
write it in C; 17 seconds versus 5 seconds may make you think twice
about it. My question is, when is C a necessary evil, rather than just
an evil. :wink:

Regards,
Jordan

On Sat, Aug 26, 2006 at 11:23:02PM +0900, Jordan Callicoat wrote:

Jürgen Strobel wrote:

Yes! That’s exactly what I was trying to do! I was trying to show that
writing to stdout (assuming a comparable context, like an xterm) is the
bottle-neck of the ruby version here. Writing to a file was about 90
times faster. Like I said, I wasn’t denying that C is faster; my point
was this: the C version may only take 5 seconds to write to a file,
but ruby only takes 17 seconds; and for me, as I’m sure for others, 12
seconds is not worth the effort of writing it in C. So the 12 minutes to
5 seconds comparison was flawed – 17 seconds to 5 seconds is more
accurate.

Usually you don’t benchmark with external arbitrary bottlenecks
enabled. The point is, your xterm may be of very different speed
than mine. Generic terminal emulators are not a comparable context at
all, not without benchmarking them themselves, which is not the point
of this exercise.

I should have emphasized your terminal emulator. We are comparing
ruby vs. C, so it makes sense to assume a fast and uniform output
mechanism which doesn’t get in our way. It is standard practice to
write to stdout, but redirect it to /dev/null or files if writeing
larger quantities in benchmarks.

To get real numbers, your C version took 0.010s CPU time, and your
ruby one took 0.202s. A difference of roughly 2000% may well be
important to several applications. Sometimes it may be completely
irrelevant, for instance if CPU time is dwarfed by I/O.

-Jürgen