For performance, write it in C

tekhne · July 28, 2006, 10:38am

Tim B. wrote:

So, first gripe: C is faster than Ruby in certain problem domains.
In others, it’s not.

The post was about people wanting better performance for their code.
Quite clearly if the code you have written in Ruby (or whatever) runs
fast enough for you then performance is a non issue. If however the
performance of your code is an issue then in truth there is only so much
improvement that you can squeeze out of Ruby, if that is enough to
resolve your performance issues then fine. If you want still more
performance then you want to write it in C (or perhaps buy some new
hardware )

Second gripe. The notion of doing a wholesale rewrite in C is almost
certainly wrong.
An earlier project of mine used GD from Ruby to calculate some colour
metrics from images and write them into a database. I rewrote the whole
thing in C, using the same GD and SQLite2 libraries as the Ruby version,
and the improvement was massive. Despite the fact that the Ruby was not
actually doing very much. Most of the time was spent in the GD library,
so I am not all that convinced that rewritting part of a project in C
will achieve quite the same improvement. And if you are going to convert
a significant chunk of code to C then you may as well go the whole hog.
In fact, the notion of doing any kind of serious hacking, without
doing some measuring first, is almost always wrong. The right way
to build software that performs well is to write a natural, idiomatic
implementation, trying to avoid stupid design errors but not worrying
too much about performance. If it’s fast enough, you’re done.
No problem here.
If it’s not fast enough, don’t write another line of code till you’ve
used used a profiler and understand what the problem is. If in fact
this is the kind of a problem where C is going to do better, chances
are you only have to replace 10% of your code to get 90% of the
available speedup.
Not been my experience to date but then perhaps I am not working on
problems that can be solved in that way.

tekhne · July 28, 2006, 12:14pm

Well, you need the compiler chain if you want to compile, that is what
inline does.

On windows, you have three options:

MS - you can get by with their free compiler (VS Express or something)
cygwin
mingw

I have full VS, and inline worked for me when I started the program
with proper environment (=proper paths set), although I’ve tried only
the examples. And it seems VC6 and VC7 (VS2003) are better to use due
to the manifest stuff that VC8 (VS2005) creates.

I haven’t tried cygwin or mingw.

J.

tekhne · July 28, 2006, 11:53am

On 2006-07-26, Kristof B. [email protected] wrote:

– extend takes a list of columns, and extends each column with a
where x free

latin_square n = latin_square_ n
where latin_square_ 0 = replicate n [] – initalize columns to nil
latin_square_ m | m > 0 = extend (latin_square_ (m-1)) n

square2str s = unlines $ map format_col s
where format_col col = unwords $ map show col

main = mapIO_ (putStrLn . square2str) (findall (\s → s =:= latin_square 5))
------------------------- end latin.curry -----------------------------

It’s really nice and compact!
AFAIK Curry is Haskell boosted with logic programming.

I – who, ATM, just watches these languages from a distance, and can’t
tell it by looking at the code – wonder if have you used here
something specific to Curry, which would be harder/uglier to express in
Haskell?

And how the Curry compiler looks like? Is it just a hacked GHC? How
Curry performance relates to that of Haskell?

Regards,
Csaba

tekhne · July 28, 2006, 12:51pm

On 7/27/06, Isaac G. [email protected] wrote:

JIT is the key to a lot of that. Performance depends greatly on
the compiler, the JVM, the algorithm, etc.

I won a bet once from a friend. We wrote comparable programs in

Java and C++ (some arbitrary math in a loop running a bazillion
times).

With defaults on both compiles, the Java was actually faster
than the C++. Even I didn’t expect that. But as I said, this
sort of thing is highly dependent on many different factors.

Yes I believe that on some theoretic base under some circumstances JIT
will
outclass a static compiler simply because it is able to optimize using
Runtime Information.
I would be surprised however, to have consistent better performance in
JIT
than in precompiled code in the near future - (ha near future can be
defined
to my needs, so I am never wrong - let’s say 0.01 ky.
Would you mind sharing these examples, maybe offlist I suggest as we are
loosing the red gem here.

Thx a lot in advance

Robert

–
Deux choses sont infinies : l’univers et la bÃªtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

Albert Einstein

tekhne · July 28, 2006, 12:42pm

Kristof B. wrote:

– extend takes a list of columns, and extends each column with a
where x free

latin_square n = latin_square_ n
where latin_square_ 0 = replicate n [] – initalize columns to nil
latin_square_ m | m > 0 = extend (latin_square_ (m-1)) n

square2str s = unlines $ map format_col s
where format_col col = unwords $ map show col

main = mapIO_ (putStrLn . square2str) (findall (\s -> s =:= latin_square 5))
------------------------- end latin.curry -----------------------------

I don’t see where elems_diff is used after it is defined.

tekhne · July 28, 2006, 3:30pm

Pit C. wrote:

Maybe I should have written that giving that I’m using the One Click
Installer, don’t have the Windows compiler toolchain, and am not
willing to use cygwin, I can’t use Ruby Inline. Is this better?
Speaking of CygWin, a couple of people here have expressed what seems
like disdain for it. I am constrained to use a Windows desktop at my day
job, and CygWin has been an important factor in my retaining my sanity
about the fact. I don’t use the server pieces of CygWin. My preference
(in open source tools) is first native Windows, second CygWin and third
(Gentoo) Linux. I was dual booted with Gentoo for a while until the
VMware Server beta started. That became a viable option so that’s how I
exercise the third option now.

So what is the source of the reluctance to use CygWin in the Ruby
community?

tekhne · July 28, 2006, 3:06pm

On Fri, 28 Jul 2006 03:39:09 -0700, William J. wrote:

– same position
| x =:= upto n &

main = mapIO_ (putStrLn . square2str) (findall (\s -> s =:= latin_square 5))
------------------------- end latin.curry -----------------------------

I don’t see where elems_diff is used after it is defined.

Heh, you are right! I defined it, and then when I didn’t need it I
forgot
to remove it.

Thanks for noting,
Kristof

tekhne · July 28, 2006, 3:33pm

On Fri, 28 Jul 2006 09:45:01 +0000, Csaba H. wrote:

      (x `elem` col) =:= False = (x:col) : addnum cs (x:prev)
------------------------- end latin.curry -----------------------------
It’s really nice and compact!
AFAIK Curry is Haskell boosted with logic programming.

Yes, exactly!

I – who, ATM, just watches these languages from a distance, and can’t
tell it by looking at the code – wonder if have you used here
something specific to Curry, which would be harder/uglier to express in
Haskell?

Yes, the =:= operator unifies terms like in logic languages, and curry
makes it possible to write nondeterministic functions. For example the
upto function I defined above can evaluate to any number from 1 upto n,
while in haskell it could have only one result. In the code that I wrote
above:
upto n | n > 1 = n ? upto (n-1)

is the same as
upto n | n > 1 = n
upto n | n > 1 = upto (n-1)

Then there are search functions that make it possible to extract all
outcomes
from a nondeterministic function in a lazy way (i.e. findall)

In haskell the above would probably be written in a monad that expresses
nondeterminism, but I doubt it will be as clear as the Curry code.

And how the Curry compiler looks like? Is it just a hacked GHC? How
Curry performance relates to that of Haskell?

As far as I know the Curry compiler I used (Munster CC) is written from
scratch, in Haskell. I doubt it is as fast and optimized as the Haskell
compiler, since Haskell has a much large userbase.

Regards,
Csaba

Regards,
Kristof

tekhne · July 28, 2006, 10:36pm

On Thursday 27 July 2006 10:13, Ashley M. wrote:

Good luck. I recently got a Ruby program that aggregates several LDAP
directory-pulls with about a million entries down from a few hours to a
few seconds, without having to drop into C. It can be done, and it’s
kindof fun too.

Next time I get a morning free I might apply some of the tweaks that have
been suggested. Â Might be interested to see how much Â I can improve the
performance.

I looked at the source of the script today, and I made these changes:

use FasterCSV instead of CSV
don’t buffer every row in the datasets, send them straight to
Zlib:GzipWriter as they are processed
don’t do hash lookups in the middle of a 15 million row loop, do them
once
in advance!

Unfortunately I’m still stuck with a nasty “rows.each { |row| row.each {
|col|
col.strip! } }” type section, to fix the poor quality of the data, which
would take a lot of time going through all the fields to thin out.

Despite this, I’ve got the run time down from over 2.5 hours to about 50
minutes. The smaller files are individually about 6x faster, but I’m
happy
with 3x faster overall. It means we can realistically run it in the day
if
there are issues.

One curious thing is that while the real time was about 50 mins, the
user time
was only about 30 mins (negligible sys time if I remember). Not sure
where
the other 20 mins has gone?

Ashley

tekhne · July 28, 2006, 10:40pm

On Thu, Jul 27, 2006 at 02:42:30PM +0900, Chad P. wrote:

performance characteristics that can be divorced from Excel (because
they’re Window’s own performance characteristic, not Excel’s). Argue
those points, and you’re arguing about the wrong software.

Design decisions that involve interfacing with interface software that
sucks is related to the software under discussion – and not all of the
interface is entirely delegated to Windows, either. No software can be
evaluated for its performance characteristics separate from its
environment except insofar as it runs without that environment.

Here’s all I’m saying: the environment is important, but it’s a variable
that must be cancelled when talking about some piece of software that’s
running on top of it. You can only make judgements about the speed of
something like Excel by comparing it to another spreadsheet with a
similar set of features running on Windows. Otherwise, you’re only
making guesses as to where the sluggishness and bloat lie.

But Wine is an emulator, and while it does a good job approaching the
speed of Windows, it doesn’t hit it, nor can it hit it. You’re not
comparing like with like. Now that’s far from sporting.

Actually, no, it’s not an emulator.

Yes, it is. It’s a set of libraries and executables that emulate a
Windows environment.

It’s a set of libraries (or a
single library – I’m a little sketchy on the details) that provides the
same API as Windows software finds in a Windows environment. An
emulator actually creates a faux/copy version of the environment it’s
emulating.

Which both Wine and Cygwin do. To quote the Wikipedia article on
emulators:

A software emulator allows computer programs to run on a platform
(computer architecture and/or operating system) other than the one

for
which they were originally written.

Linux compatibility on FreeBSD is a software emulator that fools Linux
executables into thinking they’re running on Linux. Because of the
commonalities between FreeBSD and Linux, this emulation layer can be
thin.

It is to Linux compared with Unix as an actual emulator is
to Cygwin compared with Unix: one is a differing implementation and the
other is an emulator.

?

. . . and, in fact, there are things that run faster via Wine on Linux
than natively on Windows.

Not surprising, really.

[ snip ]

under FreeBSD. Bringing Wine in is a red herring. Software cannot be
blamed for the environment it’s executed in.

I didn’t bring it up. You did. I made a comment about Excel not
working in Linux as a bit of a joke, attempting to make the point that
saying Excel performance can be evaluated separately from its dependence
on Windows doesn’t strike me as useful.

See above.

tekhne · July 28, 2006, 11:10pm

Again, I haven’t seen your code or your data, but 50 minutes still seems
like a frightfully long time for such minimal processing on only 15
million
rows. If I were you, I’d keep optimizing (but then I’m pretty obsessive
about optimizing). Exactly how much time does GzipWriter take, for
example?

Several people have correctly said that you stop optimizing when your
program is “fast enough” (which is a business decision, not a technical
one), but optimizing strictly inside of Ruby is a lot cheaper than
optimizing by dropping into C (which I do often enough), because you
don’t
incur the portability and other costs everyone has been talking about.

tekhne · July 29, 2006, 12:03am

On Sat, Jul 29, 2006 at 05:26:26AM +0900, Keith G. wrote:

being “less than sporting”. But if you want to act like that, there’s
not much I can do to stop you.

That wasn’t a passive-aggressive remark, it was a joking comment about
the inequity of your comparison (intentional or otherwise). You’re
welcome to your misconceptions and bad attitudes, though.

tekhne · July 28, 2006, 10:30pm

On Fri, Jul 28, 2006 at 04:34:19PM +0900, Chad P. wrote:

Gee willickers, I’m sorry I didn’t use the exact phrasing you wanted me
to. Maybe next time, though, you won’t claim the bits I said that
didn’t actually have anything to do with your actual complaint were
wrong.

Y’know, screw it. Be an ass if you like. I’m done with this subthread.

I was being polite until you made a passive agressive remark about me
being “less than sporting”. But if you want to act like that, there’s
not much I can do to stop you.

tekhne · July 29, 2006, 12:16am

On Sat, Jul 29, 2006 at 06:59:41AM +0900, Chad P. wrote:

I was being polite until you made a passive agressive remark about me
being “less than sporting”. But if you want to act like that, there’s
not much I can do to stop you.

That wasn’t a passive-aggressive remark, it was a joking comment about
the inequity of your comparison (intentional or otherwise).

It didn’t come across as joking.

You’re welcome to your misconceptions and bad attitudes, though.

Ditto. I wan’t the one who wrote “Y’know, screw it. Be an ass if you
like.” I hadn’t even considered flipping the bozo bit until I read that.

tekhne · July 29, 2006, 12:16am

Ashley M. wrote:

> One curious thing is that while the real time was about 50 mins, the user time > was only about 30 mins (negligible sys time if I remember). Not sure where > the other 20 mins has gone? > > Ashley >

Disk I/O? Maybe?

-Justin

tekhne · July 29, 2006, 12:19am

On Sat, Jul 29, 2006 at 07:12:09AM +0900, Bill K. wrote:

GENTLEMEN! YOU CAN’T FIGHT IN HERE, THIS IS THE WAR ROOM !!!

And I was just about to put my mexican wrestling mask on. Sigh.

K.

tekhne · July 29, 2006, 12:12am

From: “Chad P.” [email protected]

I was being polite until you made a passive agressive remark about me
being “less than sporting”. But if you want to act like that, there’s
not much I can do to stop you.

That wasn’t a passive-aggressive remark, it was a joking comment about
the inequity of your comparison (intentional or otherwise). You’re
welcome to your misconceptions and bad attitudes, though.

GENTLEMEN! YOU CAN’T FIGHT IN HERE, THIS IS THE WAR ROOM !!!

tekhne · July 29, 2006, 12:32am

On Sat, Jul 29, 2006 at 05:37:24AM +0900, Keith G. wrote:

running on top of it. You can only make judgements about the speed of
something like Excel by comparing it to another spreadsheet with a
similar set of features running on Windows. Otherwise, you’re only
making guesses as to where the sluggishness and bloat lie.

What’s important is how two pieces of software run in the same
environment, not whether the environment is the reason a given
application is slow at some things. That was my point: the GUI
performance of Excel is, indeed, relevant to a discussion of Excel
performance, despite the fact that significant chunks of Excel’s GUI is
implemented by way of the environment. Compare it with another
spreadsheet running in the same environment, and don’t cancel some of
its slowness by blaming it on Windows.

Actually, no, it’s not an emulator.

Yes, it is. It’s a set of libraries and executables that emulate a
Windows environment.

No, it’s not. Repeat after me: “WINE Is Not an Emulator”. That’s not
just an affectation. It is a statement of fact about WINE. That’s why
they call it WINE. An Windows emulator would be a “fake Windows”
running in Linux, like a VM: WINE is basically just an API that happens
to be as close to the Windows API (in all useful ways) as the WINE
developers can get it. It does not pretend to be a Windows machine. It
just provides compatibility for Windows programs on Linux.

Perhaps you aren’t aware that WINE stands for WINE Is Not an Emulator,
or that they aren’t lying whey they say that.

It is to Linux compared with Unix as an actual emulator is
to Cygwin compared with Unix: one is a differing implementation and the
other is an emulator.

?

(where ~ means “roughly equivalent to”)

Differing implementations:
Wine ~ Windows
Linux ~ Unix

Emulators:
Emulator != Original
Cygwin != Linux

tekhne · July 29, 2006, 12:33am

On Jul 28, 2006, at 3:33 PM, Ashley M. wrote:

I looked at the source of the script today, and I made these changes:

use FasterCSV instead of CSV

Fair warning, I’m coming into this conversation late and I haven’t
read all that came before this. However, if you are using FasterCSV…

Unfortunately I’m still stuck with a nasty “rows.each { |row|
row.each { |col|
col.strip! } }” type section, to fix the poor quality of the data,
which
would take a lot of time going through all the fields to thin out.

FasterCS can convert fields as they are read. I’m not sure if this
will be faster, but it may be worth a shot. See the :converters
argument to FasterCSV:

http://fastercsv.rubyforge.org/

James Edward G. II

tekhne · July 29, 2006, 1:16am

Peter H. schrieb:

Whenever the question of performance comes up with scripting languages
such as Ruby, Perl or Python there will be people whose response can be
summarised as “Write it in C”. I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).

I am assuming here that when people talk about performance they really
mean speed. Some will disagree but this is what I am talking about.

write in VHDL (or Verilog or SystemC), synthesize you piece of
hardware, plugin into PCI, have fun with most performant (in terms of
speed)and efficient (in terms of energy consumption) solution

[…]

So what am I recommending here, write all your programs in C? No. Write
all your programs in Perl? No. Write them in your favourite scripting
language to refine the code and then translate it into C if the
performance falls short of your requirements. Even if you intend to
write it in C all along hacking the code in Perl first allows you to
play with the algorithm without having to worry about memory allocation
and other such C style house keeping. Good code is good code in any
language.

I like C, but in this context C can be replaced with any compiled
language.

If you really really want that performance boost then take the following
advice very seriously - “Write it in C”.

from my point of view C has the only advantage of running on nearly
every platform. As I said above any of compiled languages would speed up
the code.

my 2 cents
Regards, Daniel