For performance, write it in C

One important thing about Ruby performance is that Ruby gets very
inefficient very quickly, as the size of the working set grows. This
manifests as an observed performance problem, but the tip-off is that
performance of a particular Ruby program is quite acceptable for small
data-set sizes (such as in your test fixtures), but falls off a cliff
with
production data-sets. Most of the times that I have found myself
rewriting
slow Ruby code in C have been to get a program that runs in far less
memory,
by manually laying out the data structures. This improves performance
not
only for the obvious reasons (less thrashing and less contention with
other
processes on a busy server), but also because you can “help” the
processor’s
cache manager to get more hits by creating localities of reference and
by
stuffing more of your working set into a smaller number of cache lines.
This
matters all the more on multiprocessors, which are severely penalized by
cache misses, to the point that some naively-written programs can run
noticeably faster on a uniprocessor.

On 7/26/06, Charles O Nutter [email protected] wrote:

there’s nothing about a
language’s design that should necessitate it being slower than any other
language.

While I accept that you shouldn’t confuse a language with its
implementation,
I find that a mildly surprising statement, especially since you said
in an earlier post:

for memory-intensive situations Java almost always wins because lazy memory
management allows work to get done first, faster

Garbage collection seems to me to be an integral part of Java’s design.

Off the top of my head, I can think of some other design aspects that
have an effect on performance: method lookup in OO languages, scoping,
continuations, closures, static vs dynamic typing, type inference,
double dispatch, consing + car + cdr in Lisp, direct vs indirect
threading in Forth, etc. These are not just matters of implementation.
Each is a language design decision with a semantic effect which incurs
or avoids a computational cost, regardless of how it’s actually
implemented. For example, Ruby has real closures, Python doesn’t. I
don’t see how you could ever reduce the cost of Ruby having closures
to zero - the memory alone is an implied overhead. Sure you can
optimize till the cows come home but different functionalities have
different costs and you can’t ever avoid that.

Regards,
Sean

On 7/26/06, Sean O’Halpin [email protected] wrote:

optimize till the cows come home but different functionalities have
different costs and you can’t ever avoid that.

In theory if two programs in two different languages produce the same
exact results the perfect compilers for each of the languages would
end up producing the same code. In theory practice is the same as
theory but in practice it isn’t.

Cheers,

Pedro.

On 7/26/06, [email protected] [email protected] wrote:

People talk about the 80:20 principle, but in my experience it’s much
more like 99:1 for applications. 99% of the code uses 1% of the run
time. 1% of the code consumes 99% of the run time. That could be the
signal processing and graphics heavy applications that I have
experienced though.

This is the “value proposition” of the “Hot Spot” technology in the
Java Virtual Machine. On the fly, it looks for byte code sections that
get executed repeatedly and it then compiles them to object code,
thereby doing runtime optimization. This allows many Java server
processes to run with near-native speeds. When Ruby runs on a virtual
machine, planned for version 2, then Ruby can do that too. The JRuby
project will effectively accomplish the same goal.

On Wed, 26 Jul 2006 17:47:13 +0900, Peter H. wrote:

Whenever the question of performance comes up with scripting languages
such as Ruby, Perl or Python there will be people whose response can be
summarised as “Write it in C”. I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).

Hi,

When reading your C code, I saw that there is a lot of code that is
generated. I’d be interested to see how well the C program does if
it can work for any size of the squares. In this case I think the
problem
is well suited for logic languages. I wrote a version in the functional
logic language Curry, which does reasonably well. It will probably not
be
faster than the C version, but a lot faster than a program written in
Ruby/Perl/Python.

If you really really want that performance boost then take the following
advice very seriously - “Write it in C”.

It can be a good idea to rewrite parts in C, but I would first check if
the algorithms are good, so that it may not even be needed to write any
C code. And perhaps there are libraries or tools that do the trick
efficiently. I would keep writing C code as the last option.

Regards,
Kristof

-------------------- start of latin.curry ----------------------------
– upto is a nondeterministic function that evaluates to
– a number from 1 upto n
upto 1 = 1
upto n | n > 1 = n ? upto (n-1)

– check if the lists r s have no element with the same value at the
– same position
elems_diff r s = and $ zipWith (/=) r s

– extend takes a list of columns, and extends each column with a
– number for the next row. It checks the number agains the column and
– against the previous numbers in the row.

extend :: [[Int]] -> Int -> [[Int]]
extend cols n = addnum cols [] where
addnum [] _ = []
addnum (col:cs) prev
| x =:= upto n &
(x elem prev) =:= False &
(x elem col) =:= False = (x:col) : addnum cs (x:prev)
where x free

latin_square n = latin_square_ n
where latin_square_ 0 = replicate n [] – initalize columns to nil
latin_square_ m | m > 0 = extend (latin_square_ (m-1)) n

square2str s = unlines $ map format_col s
where format_col col = unwords $ map show col

main = mapIO_ (putStrLn . square2str) (findall (\s -> s =:= latin_square
5))
------------------------- end latin.curry -----------------------------

On Wed, 26 Jul 2006 17:47:13 +0900, Peter H. wrote:

In this post I want to clear some things up and provide benchmarks as to
why you should take “Write it in C” seriously.

Interesting series of messages! Got to save and read them through at
leisure …

Just adding my 2c:

[I worked on a fairly complex performance tuning job once, involving HP-UNIX boxes (multiple), Informix ESQL/C batch programs, IBM MQ series (now called Websphere MQ), UNIX IPC, and got a chance to do tuning at several different levels - SQL queries, MQ logs/partitions, the C code, algorithms in it, etc. Was very educational … just sharing some insights gained from that, from reading on the subject, and from smaller hobby projects tuning my own code …]

[Not implying that previous posters on this thread haven’t done any of
the below].

Performance tuning in general is very complicated. Guesses or
assumptions like “this code tweak should make it run faster” often do
not work. The only way is a (semi)scientific approach to
measure/profile, study profile results, make hypotheses, change code
accordingly, then re-measure to see if the change made a difference.

Tuning can be done at any of several different levels, ranging from:

  • hardware (even here, not just throwing more boxes or faster boxes at
    the problem, but things like hardware architecture - obviously only if
    the skills are available and the problem is worth the effort)

  • software architecture

  • algorithms and data structures optimization

  • plain code tuning (things like common subexpression elimination, e.g.
    using C syntax, changing:

for (i = 0; i < getLength(my_collection); i++) { /* do something with
my_collection[i] */ }

to

collLength = getLength(my_collection);
for (i = 0; i < collLength; i++) { /* do something with
my_collection[i] */ }

/* which removes the repeated/redundant call to the function
getLength() */

Jon Bentley’s book “Writing Efficient Programs” is a very good book
which discusses rules, examples and war stories of tuning at almost all
of these levels, including really excellent advice on code-level
tuning, which may sometimes be the easiest one to implement on existing
code.
Though the examples are in a pseudo-Pascal dialect (easily
understandable for those knowing C), and though it may be out of print
now, for those who have a real need for tuning advice, its worth trying
to get a used copy on eBay, from a friend, whatever.

Its chock-full of code examples with the tuning results (verifiied by
measurement, as stated above), when (and when not) to apply them, and
the war stories are really interesting too …

Googling for “performance tuning” and variants thereof will help …

There’s another book (for Java, mostly server-side programming) by a
guy called Dov (something - forget his last name and the book title, if
I remember it, will post here) that’s really excellent too - he shows
(again, with actual measurements) how some of the “expected” results
were actually wrong/counter-intuitive. He worked with IBM on the web
software for one of the recent Olympics.

HTH
Vasudev

On 7/26/06, Sean O’Halpin [email protected] wrote:

in an earlier post:

for memory-intensive situations Java almost always wins because lazy
memory
management allows work to get done first, faster

Garbage collection seems to me to be an integral part of Java’s design.

Garbage collection is a characteristic of the Java VM, not the language.
Nothing in the language specifies garbage collection or how it should be
performed. It could just as easily be implemented with
reference-counting,
and the language would still look exactly the same. What IS a
characteristic
of the language is that you do not explicitly release objects when you
are
finished using them. There are many ways to interpret that and many ways
to
implement it. Garbage collection is just one among many.

Off the top of my head, I can think of some other design aspects that

different costs and you can’t ever avoid that.
You’re mixing language semantics and implementation details here. The
mechanics of method lookup is not a language feature; it’s an
implementation
detail. On the other hand, the logic of which method gets invoked in a
hierarchy of types is a language detail. Scoping is a language feature,
but
the means by which scope is maintained is an implementation detail.
Continuations are a language feature, but the means by which a
continuation
and its applicable scoping are memoized is an implementation detail.
Static
and dynamic typing are language features; the means by which type is
propagated and used for calls, checks, and memory allocation is an
implementation detail. On and on and on…language features always have
an
associated implementation, but there’s almost always multiple ways to
implement a given feature, and those different implementations will have
their plusses and minuses.

As for the closure comment…sure, there’s overhead in creating
closures,
but it’s explicit overhead. This is no different from creating the
equivalent of closures in languages that don’t support them. The concept
of
a closure has a clear specification and certainly increases the
complexity
of a language and an underlying implementation. But that complexity may
not
in itself always represent a decrease in performance, since other means
of
accomplishing the same task may be even less performant. That’s how it
is
for any language feature…you take the good with the bad, and if you
don’t
use all of the good you may be less-than-optimal. If using a feature has
ten
good aspects and five bad, and you only make use of five good aspects,
then
your code is sub-optimal. If you use less than five, you’re even worse
off
and perhaps should consider doing things differently. Nothing about the
feature itself explicitly implies that performance should degrade by
using
it…it’s a matter of using those features wisely and making optimal use
of
their good aspects, balanced with their bad aspects.

I will run your Ruby version and the Java version that I write and post
the results here. Give us a week or so as I have other things to be
doing.

Peter H. wrote:

If you really really want that performance boost then take the following
advice very seriously - “Write it in C”.

Assuming you have no choice but C/C++. That’s why I like using the
jvm or clr with languages like jruby, groovy, or boo. You don’t have
to use C, you can use java or C# or boo itself (since it is statically
typed with type inference): http://boo.codehaus.org/
or C/C++ as well, although it is 100x easier to interface with a C lib
from the clr than it is from jvm with jni.

Kristof B. wrote:

Hi,

When reading your C code, I saw that there is a lot of code that is
generated. I’d be interested to see how well the C program does if
it can work for any size of the squares. In this case I think the problem
is well suited for logic languages. I wrote a version in the functional
logic language Curry, which does reasonably well. It will probably not be

Interesting … I read somewhere that the OCaml language, while
higher-level than C (and a functional one too), runs some programs at
least, as fast or faster than C …
Not sure how true that is …

Vasudev

Writing code that runs as fast in Java as it does in C is real work,
but it’s possible.

Integer (http://athena.com) is a pure Java spreadsheet. I optimized
the numeric functions and array functions (e.g., SUM(A1:G99)) such
that Integer runs as fast or faster than Excel and OpenOffice Calc on
identical hardware. However, it required a mind shift from “standard”
Java programming.

In addition, because Java has nice semantics for multithreading, I was
able to implement some very cleaver algorithms such that Integer’s
recalculation speed scales nearly linearly with additional CPUs up to
a certain point (the linearity goes away at around 16 processors on a
big Sun box.) But I digress.

First, I pre-allocated a lot of workspace so that there’s little
memory allocation going on during recalculation.

Second, I avoid Monitors (synchronized) as much as possible.

Third, I write “C” style Java (lots of switch statements, putting
parameters and results in buffers rather than passing them on the
stack, etc.)

Memory usage in Java is higher than in C. If Java has Structs a la
C#/Mono, it’d be possible to squeeze that much more performance from
Java.

There are some applications that will never perform as in Java (e.g.,
stuff that’s heavily oriented to bit manipulation.) But for many
classes of applications (e.g., spreadsheets) Java can perform as well
as C.

When I care about computational performance, I go with Java or in a
rare case, C++ (C style C++, no STL or virtual methods). If I care
about developer performance, I’ve been going with Ruby more and more.

My 2 cents.

language (take your pick).
logic language Curry, which does reasonably well. It will probably not
be

Interesting … I read somewhere that the OCaml language, while
higher-level than C (and a functional one too), runs some programs at
least, as fast or faster than C …
Not sure how true that is …

Vasudev
http://www.dancingbison.com

You read that correctly. The problem is that nearly every benchmark I’ve
seen for comparing the performance of various languages has been a
repeated mathematical operation like computing a Mandelbrot Set or
running
Fibonacci Sequences that all but guarantees the edge will belong to
functional languages like Haskell and OCAML or stripped-down
assembly-like
languages like C (http://shootout.alioth.debian.org/debian/ for
samples),
because they are best suited for straight-up number crunching. Are there
good benchmarks for OO languages? Or dynamic languages? Are there good
benchmarks that could actually measure the types of uses I need, where
I’m
building a web front end to a DB store? I don’t know about you, but my
job
has never involved fractals.

I used to put faith into benchmarks like this, but now I think about
developer time and maintenance time as well. That seems to be a more
intelligent approach.

Jake

On 7/26/06, Pedro Côrte-Real [email protected] wrote:

In theory, an infinite number of computer scientists hacking for an
infinite amount of time on a keyboard will eventually almost surely
produce a perfect compiler.

In practice, I can’t wait that long :wink:

Cheers,
Sean

On 7/26/06, David P. [email protected] wrote:

In addition, because Java has nice semantics for multithreading, I was
able to implement some very cleaver algorithms such that Integer’s
recalculation speed scales nearly linearly with additional CPUs up to
a certain point (the linearity goes away at around 16 processors on a
big Sun box.) But I digress.

This must be evidence of true cutting edge development :wink:

Tomasz W. wrote:

right approach.

Sorry, I just couldn’t resist - but maybe you should code Java instead -
http://kano.net/javabench/ :wink:

“The results I got were that Java is significantly faster than
optimized C++ in many cases… I’ve been accused of biasing the results
by using the -O2 option for GCC…”

“…so I took the benchmark code for C++ and Java from the now outdated
Great Computer Language Shootout and ran the tests myself”

Not so outdated
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=java&lang2=gpp

On Thu, 27 Jul 2006 00:55:05 +0900, harrisj wrote:

http://www.dancingbison.com

You read that correctly. The problem is that nearly every benchmark I’ve
seen for comparing the performance of various languages has been a
repeated mathematical operation like computing a Mandelbrot Set or running
Fibonacci Sequences that all but guarantees the edge will belong to
functional languages like Haskell and OCAML or stripped-down assembly-like
languages like C (http://shootout.alioth.debian.org/debian/ for samples),
because they are best suited for straight-up number crunching.

In some cases the functional version is faster because the problem can
be
more easily described in a functional way. But in general code produced
by ocaml is about twice as slow as C, because the compiler doesn’t do
the
same extensive optimizations as for example gcc does. But that’s still
pretty good.

Are there
good benchmarks for OO languages? Or dynamic languages? Are there good
benchmarks that could actually measure the types of uses I need, where I’m
building a web front end to a DB store? I don’t know about you, but my job
has never involved fractals.

True, benchmarks only measure execution speed, but they don’t show if a
given programmer will be productive in them. I think that’s also
largely a personal choice. Some people may be more productive in a
functional language, some people more in Ruby. And others even in
perl… :slight_smile:

Kristof

Greg,

In spreadsheets, it is cutting edge. Name one other commercial
spreadsheet that can use more than 1 CPU?

David

On 7/26/06, Gregory B. [email protected] wrote:

On 7/26/06, David P. [email protected] wrote:

In addition, because Java has nice semantics for multithreading, I was
able to implement some very cleaver algorithms such that Integer’s
recalculation speed scales nearly linearly with additional CPUs up to
a certain point (the linearity goes away at around 16 processors on a
big Sun box.) But I digress.

This must be evidence of true cutting edge development :wink:

No doubt, I wish I had that kind of problem when running my spreadsheet
application of choice. That’s one hell of a business desktop!

Sean O’Halpin wrote:

In theory, an infinite number of computer scientists hacking for an
infinite amount of time on a keyboard will eventually almost surely
produce a perfect compiler.

In practice, I can’t wait that long :wink:

You’d be waiting a long time indeed :o).

I believe our good friend Mr. Turing proved [1] that such
a compiler could never exist, some seven decades ago.

Oh well.

Martin

[1] OK. That wasn’t exactly what he proved.
But only this particular corollary is relevant.

On Jul 26, 2006, at 11:28 AM, David P. wrote:

Greg,

In spreadsheets, it is cutting edge. Name one other commercial
spreadsheet that can use more than 1 CPU?

I’m pretty sure Greg was funning around with the comical typo in your
post. Take a look back at how you spelled “clever.” :wink:

James Edward G. II