For performance, write it in C

tekhne · July 27, 2006, 1:44am

On 7/27/06, Chad P. [email protected] wrote:

x, bar = foo()
x, bar = foo()
print x, bar() ==> 6 6

Is it just me, or are there no proper closures in that example code?

I’ve crossed my eyes twice and still can’t see it

tekhne · July 27, 2006, 1:38am

On 7/26/06, Chad P. [email protected] wrote:

On Thu, Jul 27, 2006 at 12:23:23AM +0900, Charles O Nutter wrote:

You’re mixing language semantics and implementation details here.
[snip]
In some ways, you’re right: implementation details are being mixed up
with language definition in the preceding list of features.
[snip]

Nope - there’s no mix up. My point is that any feature of a language
that requires extra work
whether it be at compile time or run time incurs a cost. Those
features are generally there to make life easier for us programmers,
not the machine. The only way to make sure you’re not paying that
price is to hand-code optimised machine code for a specific processor
and hardware context. No language translator can guarantee that it
will produce better code (regardless of the nonsense of the ‘perfect
compiler’).

Let me take two examples from my list. First, method lookup in OO
languages. There is no
way you can optimise this across the board to static jumps in a
language like Ruby or Smalltalk. There will always be the requirement
(imposed by the ~semantics~ of the language) to be able to find the
right method at runtime. This is part of the language design which
imposes constraints on the implementation that assembly languages (for
example) do not have to pay. There is a cost for abstraction (a cost
which I am willing to pay by the way). Of course, you can implement
virtual methods in assembly, but you don’t ~have~ to. In Ruby there is
no choice. Everything is an object. (You can optimise most of it away,
but not all).

Second, closures:

Chad P. said:

A little extra memory usage does not translate directly to performance loss.

Coming from a background where I had to make everything happen in 48K,
I have to disagree. And it’s not always just ‘a little extra memory
usage’. Careless use of closures can cripple an application. See the
problems the Rails team encountered.

Charles - you say that closures are explicit - I beg to differ. By
definition, they are surely implicit. Doesn’t your argument that they
can be simulated by other means contradict your statement?

As for the notion that a hardware YARV processor would make a
difference - how would that ameliorate the issues Ruby has with memory
usage? Performance isn’t just about time - space also matters.

I am surprised that you think I am confusing language features with
implementation details. From my point of view, it is you who are
ignoring the fact that abstractions incur a cost.

Best regards (I’m enjoying this discussion BTW
Sean

tekhne · July 27, 2006, 1:51am

On Thu, Jul 27, 2006 at 08:35:47AM +0900, Sean O’Halpin wrote:

Chad P. said:

A little extra memory usage does not translate directly to performance
loss.

Coming from a background where I had to make everything happen in 48K,
I have to disagree. And it’s not always just ‘a little extra memory
usage’. Careless use of closures can cripple an application. See the
problems the Rails team encountered.

Careless use of ANYTHING can cripple an application. Using an extra
kilobyte of memory on a 1GB system for a closure instance or two is not
indicative of an inescapable performance cost for the mere fact of the
existence of closures. While your point about 48K of RAM is well-taken,
it’s pretty much inapplicable here: I wouldn’t be writing Ruby programs
to run in 48K. Hell, my operating system won’t run in 48K, nor even a
hundred times that (I’m using Debian GNU/Linux Etch/Testing, if that
matters). I’m sure as heck not going to expect all my applications to
run in that environment.

Careless use of pointers can cripple not only the application, but the
whole system. Careless use of loops can crash it. Careless use of the
power cord can destroy the hardware. In the grand scheme of things,
closures are not a good example of a language feature that hinders
performance when we’re talking about high-level languages such as Ruby.

tekhne · July 27, 2006, 1:54am

On Thu, Jul 27, 2006 at 08:33:31AM +0900, Chad P. wrote:

x, bar = foo()
x, bar = foo()
print x, bar() ==> 6 6

Is it just me, or are there no proper closures in that example code?

No, Chad. There’s closures in there. What you’re not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

tekhne · July 27, 2006, 2:06am

On Thu, Jul 27, 2006 at 08:53:18AM +0900, Keith G. wrote:

On Thu, Jul 27, 2006 at 08:33:31AM +0900, Chad P. wrote:

Is it just me, or are there no proper closures in that example code?

No, Chad. There’s closures in there. What you’re not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Maybe I’m missing something critical about Python, but I don’t see the
persistent code construct being passed from the function when its
lexical scope (assuming it’s truly lexical, which it might not be in
this case) closes. It’s only a closure if there’s something persistent
that was “closed” by the scope closing.

tekhne · July 27, 2006, 2:12am

On Thu, Jul 27, 2006 at 08:53:18AM +0900, Keith G. wrote:

On Thu, Jul 27, 2006 at 08:33:31AM +0900, Chad P. wrote:

Is it just me, or are there no proper closures in that example code?

No, Chad. There’s closures in there. What you’re not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Clarification: like Java, Python won’t let you assign to variables in
the outer scope, so you have to use an array or some other object to
hack around that if you need that functionality. I know, it sucks, but
the fact it doesn’t allow you to assign to an outer scope doesn’t stop
Python from having closures, just that it doesn’t trust the developer
not to screw things up.

Here’s a better example:

def foo():
x = [0]
def bar():
x[0] += 1
print x[0]
return bar

baz = foo()
baz() -> 1
baz() -> 2
baz() -> 3

Of course, this is better implemented as a generator.

tekhne · July 27, 2006, 1:54am

On Thu, Jul 27, 2006 at 08:42:11AM +0900, Sean O’Halpin wrote:

Is it just me, or are there no proper closures in that example code?

I’ve crossed my eyes twice and still can’t see it

Remember what your mother said: if you keep doing that, your eyes might
stick that way. Be careful.

tekhne · July 27, 2006, 2:16am

On Thu, 27 Jul 2006, Chad P. defenestrated me:

I am not familiar with Perl’s compiler. Does it compile to processor-native
code or to an intermediate bytecode of some kind?

There is no intermediate bytecode step for Perl, as far as I’m aware.
It’s not a question I’ve directly asked one of the Perl internals
maintainers, but everything I know about the Perl compiler confirms my
belief that it simply does compilation to machine code.

I have not been on the perl train for years, but I believe for Perl5
at least this is not true. I remember Malcolm Beatties B module which
basically exposed the intermediate bytecodes that perl normally
interprets.
That was some time ago and things may have changed?

Here is some documentation on this (it could be old but it seems to
match my memory):

http://www.faqs.org/docs/perl5int/c163.html

So it looks like Perl is somewhat similiar to Java (perhaps the other
way around since Perl’s interpreter pre-dates Java). An analogy of the
difference would be that Perl is CISC and Java is RISC since Perl
bytecode
is higher level. Maybe they JIT pieces?

-Tom

tekhne · July 27, 2006, 2:28am

On 7/26/06, Chad P. [email protected] wrote:

On Thu, Jul 27, 2006 at 12:42:46AM +0900, David P. wrote:

There are some applications that will never perform as in Java (e.g.,
stuff that’s heavily oriented to bit manipulation.) But for many
classes of applications (e.g., spreadsheets) Java can perform as well
as C.

Is that heavily optimized Java vs. “normal” (untweaked) C?

No. That’s heavily optimized Java vs. heavily optimized C. I spent a
fair amount of time chatting with the Excel team a while back. They
cared as much about performance as I did. They spent a lot more time
and money optimizing Excel than I did with Integer. They had far more
in terms of tools and access to compiler tools than I did (although
Sun was very helpful to me.)

What was at stake was not someone’s desktop spreadsheet, but was the
financial trader’s desk. Financial traders move millions (and
sometimes billions) of Dollars, Euros, etc. through their spreadsheets
every day. A 5 or 10 second advantage in calculating a spreadsheet
could mean a significant profit for a trading firm.

So, I am comparing apples to apples. A Java program can be optimized
to perform as well as a C program for certain tasks.

tekhne · July 27, 2006, 3:09am

On Thu, Jul 27, 2006 at 09:26:45AM +0900, David P. wrote:

Is that heavily optimized Java vs. “normal” (untweaked) C?

No. That’s heavily optimized Java vs. heavily optimized C. I spent a
fair amount of time chatting with the Excel team a while back. They
cared as much about performance as I did. They spent a lot more time
and money optimizing Excel than I did with Integer. They had far more
in terms of tools and access to compiler tools than I did (although
Sun was very helpful to me.)

Excel isn’t a very good point of comparison for C. For one thing, it’s
not C – it’s C++. For another, it has a pretty bad case of featuritis.

tekhne · July 27, 2006, 3:37am

On Thu, Jul 27, 2006 at 10:07:00AM +0900, Chad P. wrote:

not C – it’s C++. For another, it has a pretty bad case of featuritis.
Actually, the part that counts, calculation engine, comes in two
varieties: a slow but provably correct version, and a fast, highly
optimised version, a significant portion of which is written assembly
language. MS use a battery of regression tests to ensure that the fast
one always gives the same results as the slow one.

Just because the bits that aren’t performance sensitive are written in
C++ doesn’t mean that the rest of it is slow and bloated.

K.

tekhne · July 27, 2006, 3:24am

On Thu, Jul 27, 2006 at 09:14:49AM +0900, Thomas E Enebo wrote:

That was some time ago and things may have changed?

Here is some documentation on this (it could be old but it seems to
match my memory):

Parts of the Interpreter

So it looks like Perl is somewhat similiar to Java (perhaps the other
way around since Perl’s interpreter pre-dates Java). An analogy of the
difference would be that Perl is CISC and Java is RISC since Perl bytecode
is higher level. Maybe they JIT pieces?

I believe you are correct, with regard to an intermediate code step,
after all. I’ve done some research on it to refresh my memory. Whether
it continues to compile to a machine-readable executable or interprets
the intermediate code form is something I haven’t been able to nail down
yet. I’ll keep looking. Apparently, I was wrong somewhere along the
way.

tekhne · July 27, 2006, 4:20am

On Thu, Jul 27, 2006 at 10:50:32AM +0900, Chad P. wrote:

On Thu, Jul 27, 2006 at 10:35:52AM +0900, Keith G. wrote:

Actually, the part that counts, calculation engine, comes in two
varieties: a slow but provably correct version, and a fast, highly
optimised version, a significant portion of which is written assembly
language. MS use a battery of regression tests to ensure that the fast
one always gives the same results as the slow one.

That might be the “part that counts” (nice pun) for calculation, but
it’s not the only part that counts.

As far as Excel goes, it is. It’s the single biggest time sink in the
application.

Interface rendering, interactive
operations, and so on are also fairly important performance-wise

…most of which is down to Windows itself, not Excel. Excel’s
contribution to that lag isn’t, I believe, all that great. So in this
regard, your complaint is more to do with GDI and so on than with Excel
itself.

least to the user. In fact, calculation waits can be easier to overlook
as a user than waits for the application to catch up when you click on a
button.

Two point:

As far as I know, Excel runs its interface on one thread and the
calculation engine on another. This helps give the apprearance of
Excel being snappier than it actually is: you’re able to work on the
spreadsheet while it’s recalculating cells.
On simple spreadsheets, the lag isn’t noticible. But Excel is
designed to be able to handle big spreadsheets well. That’s why so
much work is put into the calculation engine rather than having an
interface which is completely fat free: in time critical
applications,
it’s the calculation engine that really matters.

I use Excel a lot, and have for a few years now. Grudgingly, mind you,
because I dislike having to deal with spreadsheets. But as far as MS
applications go, I think your accusations of slowness and bloat are a
little off the mark and better targeted towards its fellow MS Office
software.

Where Excel does fall down in turns of speed is disc I/O. There it can
be atrociously slow.

On the other hand, if we were specifically referring to things like
column calculation speed (of which I wasn’t strictly aware), then your
point is made.

Recalculating a spreadsheet is something more that just calculating
columns. Excel itself is a Turing-complete dataflow machine. Getting
something like that which is both correct and fast is hard.

K.

tekhne · July 27, 2006, 3:52am

On Thu, Jul 27, 2006 at 10:35:52AM +0900, Keith G. wrote:

Actually, the part that counts, calculation engine, comes in two
varieties: a slow but provably correct version, and a fast, highly
optimised version, a significant portion of which is written assembly
language. MS use a battery of regression tests to ensure that the fast
one always gives the same results as the slow one.

That might be the “part that counts” (nice pun) for calculation, but
it’s not the only part that counts. Interface rendering, interactive
operations, and so on are also fairly important performance-wise – at
least to the user. In fact, calculation waits can be easier to overlook
as a user than waits for the application to catch up when you click on a
button.

On the other hand, if we were specifically referring to things like
column calculation speed (of which I wasn’t strictly aware), then your
point is made.

tekhne · July 27, 2006, 4:29am

On Thu, Jul 27, 2006 at 09:09:43AM +0900, Keith G. wrote:

Clarification: like Java, Python won’t let you assign to variables in
def bar():
x[0] += 1
print x[0]
return bar

baz = foo()
baz() -> 1
baz() -> 2
baz() -> 3

That’s a bit clearer – and it does look like a proper closure. It also
looks unnecessarily complex to implement. Thanks for the example.

tekhne · July 27, 2006, 5:20am

On Thu, Jul 27, 2006 at 11:38:34AM +0900, Chad P. wrote:

…most of which is down to Windows itself, not Excel. Excel’s
contribution to that lag isn’t, I believe, all that great. So in this
regard, your complaint is more to do with GDI and so on than with Excel
itself.

Excel doesn’t run so well on Linux, so I’ll just leave that lying where
it is.

In fairness, if you’re judging it’s performance based on running it in
Wine, that’s not really a fair comparison.

K.

tekhne · July 27, 2006, 4:38am

On Thu, Jul 27, 2006 at 11:17:44AM +0900, Keith G. wrote:

On Thu, Jul 27, 2006 at 10:50:32AM +0900, Chad P. wrote:

That might be the “part that counts” (nice pun) for calculation, but
it’s not the only part that counts.

As far as Excel goes, it is. It’s the single biggest time sink in the
application.

. . .
I’ll put it this way: it’s not the only part that counts for me, and for
other spreadsheet users with whom I’ve discussed Excel in the past.

Interface rendering, interactive
operations, and so on are also fairly important performance-wise

…most of which is down to Windows itself, not Excel. Excel’s
contribution to that lag isn’t, I believe, all that great. So in this
regard, your complaint is more to do with GDI and so on than with Excel
itself.

Excel doesn’t run so well on Linux, so I’ll just leave that lying where
it is.

spreadsheet while it's recalculating cells.
On simple spreadsheets, the lag isn’t noticible. But Excel is
designed to be able to handle big spreadsheets well. That’s why so
much work is put into the calculation engine rather than having an
interface which is completely fat free: in time critical applications,
it’s the calculation engine that really matters.

. . . and yet, the interface being fat-free would be awfully nice.
Instead, it gets fatter with every new version.

I use Excel a lot, and have for a few years now. Grudgingly, mind you,
because I dislike having to deal with spreadsheets. But as far as MS
applications go, I think your accusations of slowness and bloat are a
little off the mark and better targeted towards its fellow MS Office
software.

It’s true that other MS Office applications are worse. That doesn’t
make Excel perfect.

Where Excel does fall down in turns of speed is disc I/O. There it can
be atrociously slow.

I certainly won’t disagree with that. Disk access seems to be another
problem – though it’s easier to overlook than some other issues, once a
spreadsheet is loaded and before it needs to be saved again.

On the other hand, if we were specifically referring to things like
column calculation speed (of which I wasn’t strictly aware), then your
point is made.

Recalculating a spreadsheet is something more that just calculating
columns. Excel itself is a Turing-complete dataflow machine. Getting
something like that which is both correct and fast is hard.

I don’t particularly see how that contradicts what I said. I may have
been more flippant in my reference to calculations than you’d like, but
I didn’t say anything inaccurate.

tekhne · July 27, 2006, 5:27am

On Thu, Jul 27, 2006 at 11:28:20AM +0900, Chad P. wrote:

baz() -> 1
baz() -> 2
baz() -> 3

That’s a bit clearer – and it does look like a proper closure. It also
looks unnecessarily complex to implement. Thanks for the example.

In practice, it’s not really all that bad. Most of the places where I’d
end up using closures in, say Ruby and JavaScript, I’d end up using
generators, list comprehensions, &c. in Python. Having to name the inner
function’s a bit of a pain, but generally I don’t end up assigning to
the variables in the outer scope anyway, so that’s not such a big deal
either.

Different stroke, and all that.

K.

tekhne · July 27, 2006, 6:23am

Ashley M. wrote:

I think the total data size is about 1.5GB, but the individual files
are smaller, the largest being a few hundred GB. The most rows in a
file is ~15,000,000 I think. The server I run it on has 2GB RAM (an
Athlon 3500+ running FreeBSD/amd64, so the hardware is not really an
issue)… it can get all the way through without swapping (just!)

The problem isn’t the raw size of the dataset. It’s the number of
objects you create and the amount of garbage that has to be cleaned up.
If you’re clever about how you write, you can help Ruby by not creating
so much garbage. That can give a huge benefit.

The processing is pretty trivial, and mainly involves incrementing
some ID columns so we can merge datasets together, adding a text
column to the start of every row, and eliminating a few duplicates.

Eliminating the dupes is the only scary thing I’ve seen here. What’s the
absolute smallest piece of data that you need to look at in order to
distinguish a dupe? (If it’s the whole line, then the answer is 16
bytes- the length of the MD5 hash ;-)) That’s the critical working set.
If you can’t get the Ruby version fast enough, it’s cheap and easy to
sort through 15,000,000 of them in C. Then one pass through the sorted
set finds your dupes. I’ve never found a consistently-fastest performer
among Ruby’s several different ways of storing sorted sets.

Make sure that your inner loop doesn’t allocate any new variables,
especially arrays- declare them outside your inner loop and re-use them
with Array#clear.

doesn’t improve things, there’s always the option of going dual-core
and forking to do independent files.

Obviously I haven’t seen your code or your data, but if the Ruby app is
memory-bus-bound, then this approach may make your problem worse, not
better.

Good luck. I recently got a Ruby program that aggregates several LDAP
directory-pulls with about a million entries down from a few hours to a
few seconds, without having to drop into C. It can be done, and it’s
kindof fun too.

tekhne · July 27, 2006, 7:01am

On Thu, Jul 27, 2006 at 12:19:24PM +0900, Keith G. wrote:

In fairness, if you’re judging it’s performance based on running it in
Wine, that’s not really a fair comparison.

I’m judging it based on running it on Windows. My point is that
divorcing it from the only environment in which it runs (natively) is
less than strictly sporting of you, when trying to discuss its
performance characteristics (or lack thereof).