Benchmark obsession?

aris · June 20, 2012, 4:43pm

Hi,

After having read this list for a while, I wonder why some of you put so
much weight on speed optimizations. I’m not talking about big things
that really make sense but small stuff like “Don’t use symbols, they
can’t be garbage collected”, “Don’t concatenate strings, use string
interpolation instead”, “Don’t use Enumerable#inject to build up
objects” etc. etc.

In my opinion, this is like trying to get a classic car faster. It just
makes no sense. Ruby isn’t about speed, it’s about elegance and clarity.
If you’re looking for speed, you’ve got the wrong language. Use C or
whatever.

I’d always prefer an elegant solution over a fast one. For example, I
love functional style programming with Enumerable#inject. I don’t care
if it’s some milliseconds slower than assigning the values to a
variable.

janeden · June 20, 2012, 5:23pm

Speed does sometimes matter, even in this kind of micro-benchmarks.
Maybe you’re writing a JSON processor, maybe a parser, maybe a math
library. Of course, most times you don’t, and you shouldn’t care about
being a millisecond faster.

And now I can’t not comment on the three examples you brought up.

“Don’t use symbols, they can’t be garbage collected” – this sounds
like someone who doesn’t know what the hell they are doing would say.
If there is ever a case where you have created enough symbols to
create a visible memory footprint, you are doing something extremely
wrong. Using symbols where you would use magic constants or enums in
language like C, or as “static” (as in, not dynamically generated, or
limited to a certain number – for example columns of a table in a
database) keys of a hash is perfectly okay and it is near impossible
for this to cause problems with GC (symbol’s text is internally only
stored once, and symbols are special-cased to avoid memory
indirection in C code and passed around as fake-pointers; only other
class treated like this is Fixnum). In fact, this can lead to a faster
code when you replace constant strings with symbols. (Disclaimer: I
didn’t benchmark this.)

“Don’t concatenate strings, use string interpolation instead” – I’d
say that in most cases string interpolation is clearer than
concatenating, especially when a lot of “.to_s” calls would have to be
used.

“Don’t use Enumerable#inject to build up objects” – I am morally
opposed to using #inject for anything else that actually folding an
array into a single value, which is what is was intended for.
Injecting with an object, as clever as it is, is not clear code.

– Matma R.

janeden · June 20, 2012, 6:02pm

On Wed, Jun 20, 2012 at 4:43 PM, Jan E. [email protected] wrote:

After having read this list for a while, I wonder why some of you put so
much weight on speed optimizations.

I can’t say I have observed an obsession with things like these.
Maybe it’s just the fun or that it is so easy to Benchmark.

I’m not talking about big things
that really make sense but small stuff like “Don’t use symbols, they
can’t be garbage collected”,

That is certainly a stupid general rule because in some situations
this is exactly what you want: identifiers which do not need to be
created and which are not GC’ed.

“Don’t concatenate strings, use string
interpolation instead”, “Don’t use Enumerable#inject to build up
objects” etc. etc.

As I said, I don’t have made the same observation you apparently have
made. I do not read most of the traffic here but if it really was
obsessive I think I would have noticed. Strange how perceptions can
be so different.

Cheers

robert

janeden · June 20, 2012, 7:01pm

On Wed, Jun 20, 2012 at 7:43 AM, Jan E. [email protected] wrote:

After having read this list for a while, I wonder why some of you put so
much weight on speed optimizations. I’m not talking about big things
that really make sense but small stuff like “Don’t use symbols, they
can’t be garbage collected”, “Don’t concatenate strings, use string
interpolation instead”, “Don’t use Enumerable#inject to build up
objects” etc. etc.

See http://twitter.com/roflscaletips

In my opinion, this is like trying to get a classic car faster. It just

makes no sense. Ruby isn’t about speed, it’s about elegance and clarity.
If you’re looking for speed, you’ve got the wrong language. Use C or
whatever.

There are some very egregious things that libraries can do which they
shouldn’t and will significantly affect the performance of all running
code, like altering the class hierarchy at runtime and thus invalidating
all method caches at all call sites. This is bad and people should call
that thing out (“DCI” people, I’m looking at you…)

However, when it comes to microoptimizing your Ruby code like that, you
should probably be using something like perftools to measure. Code has
different performance characteristics in different scenarios, so unless
you
have some real-world code you’re trying to make faster, it’s kind of a
pointless exercise. If you do have said code, you should optimize it in
a
data-driven way. The best speedups you get will probably be from using
code
with better algorithmic properties and not from microoptimizing
minutiae.

janeden · June 20, 2012, 6:43pm

I’d say more like a “common fallacy” than an obsession. Indeed, it’s a
common problem across all languages, not just Ruby. I recall advice in
Basic that said “don’t use new lines unless you need a label.” In thirty
years of coding and performance evaluation, I can’t recall a case where
micro-tuning was sufficient to solve a performance issue, yet there are
many times I’ve seen it used.

Sent from my iPhone

janeden · June 20, 2012, 7:04pm

The real question is: how many microtunes does it take for the advantage
to offset the cost of one additional bug?

A: a lot.

Clarity and simplicity is almost always the best approach, I feel.

janeden · June 20, 2012, 9:08pm

On Wed, Jun 20, 2012 at 11:04 AM, Dan C. [email protected]
wrote:

The real question is: how many microtunes does it take for the advantage
to offset the cost of one additional bug?

A: a lot.

or a micro optimization that makes some that happens many, many times
faster.

if performance is not acceptable, and profiling indicates a spot that
needs fixing, a micro-optimization could be the right thing to do.

Clarity and simplicity is almost always the best approach, I feel.

Clarity, simplicity, and profiling if/when you run into problems.

janeden · June 20, 2012, 11:57pm

On Jun 20, 2012, at 07:43 , Jan E. wrote:

I kinda feel like I’m being called out as I’m on record (many times) for
2/3rds of your examples so I’ll address them specifically:

“Don’t concatenate strings, use string interpolation instead”

Using a recent real example from this list where I suggested
interpolation:

m.ClassMethodString + " " + m.ClassMethodString + ": " +

m.ClassMethodInteger.to_s

mmmmm java code.

slower – several more method calls
wasteful – creates much more garbage
longer/uglier – I’d argue that it is much less elegant

vs

"#{m.ClassMethodString} #{m.ClassMethodString}:

#{m.ClassMethodInteger}"

clarity – it’s just a string.
elegance – it’s JUST a string AND you don’t need those stupid #to_s
calls.
efficient – takes less time, uses less memory, makes less garbage,
and even easier to read.

“Don’t use Enumerable#inject to build up objects” etc. etc.

Given #inject’s other alias, #reduce, it is obvious that you don’t use
#inject for building up other objects. Even in a functional style of
programming you’d never see it building up anything. You’d see it
REDUCING (folding) an object. If #inject is applied in a non-folding
manner, it isn’t functional, it is just dumb. Don’t pretend otherwise
(and if you do pretend otherwise, go read more books on lisp–start with
SICP). The second I see a semicolon (or return) in an inject, I
immediately suspect that someone is writing clevar/stupid code.

I don’t have any recent examples from the list, but I’m on record in
multiple mediums ranting against people who use #inject improperly. I’ll
make up one based on examples I’ve seen time and time again:

return im_a_lazy_bastard.inject(Hash.new 0) { |h, o|

h[o.really_really_lazy] += 1; h }

vs

counter = Hash.new 0
thingies.each do |o|
  counter[o.key] += 1
end
return counter

I use #each because it adds CLARITY. I want to enumerate each
element. I’m not folding anything.
Yes, it’s faster. I don’t actually care about that nearly as much as
#1.
Yes, it is more lines:
1. but only if you write the inject version that way.
2. I use the Weirich Method 1 of choosing {} vs do/end.
  INCREASING clarity and intent.
3. each line is a stand-alone concept that helps increase clarity.

Here is a perfect example of an actual folding application of #inject:

classname.split(/::/).inject(Object) { |k, n| k.const_get n }

vs:

k = Object
classname.split(/::/).each { |n| k = k.const_get n }
k

As you can see, the #inject version is incredibly clear and concise. The
second example takes longer to figure out. That is what the natural fit
of a well designed method is supposed to do.

Come to think of it (!!!) I DO have a real world example of inject that
I used in my Ruby Sadism talk:

if MODELS.keys.inject(true) {|b, klass| b and
klass.constantize.columns.map(&:name).include?
association.options[:foreign_key]} then

…

end

Have fun with that… It’s probably the most egregious use of inject
I’ve ever found. The original author actually argued that he wrote it
that way “for maintainability”.

janeden · June 21, 2012, 1:44am

Jan E. wrote in post #1065412:

If you use string interpolation for clarity, that’s perfect. I fully
agree with you. The same goes for inject (with “building up an object” I
actually meant “building up the aggregate value” – so there’s no
disagreement on that).

If I have a choice between writing:

puts “a = #{a}\n”
“b = #{b}”

and

$stdout << "a = " << a << "\nb = " << b << “\n”

which is clearer? C++ fans might prefer the second, while I prefer the
first. In any case, I’m glad to hear the first happens to be faster, as
well :).

janeden · June 21, 2012, 12:51am

Ryan D. wrote in post #1065406:

I kinda feel like I’m being called out as I’m on record (many times) for
2/3rds of your examples so I’ll address them specifically:

Well, it seems those examples were a bit ambiguous. I’m not arguing
against string interpolation etc. I’m against using them for the sole
purpose of saving some bytes and CPU cycles.

If you use string interpolation for clarity, that’s perfect. I fully
agree with you. The same goes for inject (with “building up an object” I
actually meant “building up the aggregate value” – so there’s no
disagreement on that).

My point is that we should focus on readability, clarity, elegance etc.
rather than do everything to make our programs run a bit faster. That’s
just not what Ruby is for (at least to my understanding).

janeden · June 21, 2012, 3:08am

On 21/06/12 12:49, Avdi G. wrote:

end
return counter

I’m curious if you consider #each_with_object a reasonable choice for
this.

–
Avdi

Is that basically the same thing wrapped in another method so that
counter and o are yielded to a block?

def each_with_object(memo)
return to_enum :each_with_object, memo unless block_given?
each do |element|
yield element, memo
end
memo
end

Sam

janeden · June 21, 2012, 2:50am

On Jun 20, 2012 5:57 PM, “Ryan D.” [email protected] wrote:

counter = Hash.new 0
thingies.each do |o|
counter[o.key] += 1
end
return counter

I’m curious if you consider #each_with_object a reasonable choice for
this.

janeden · June 21, 2012, 4:43am

I’ll add to the string interpolation issue with an anecdote: I’ve had
real world examples (in projects about which I’m forbidden to talk for
legal reasons) where a refactoring from:

foo = “a” + b + “c”

type string assembly to:

foo = “”; foo << “a” << b << “c”

caused an immense speedup (we’re talking tens of minutes here),
reduced the memory footprint dramatically, and generally made our
lives on the floor that little bit easier. Of course for something
like my abc example above I’d definitely use “a#{b}c” because it’s
more readable (as well as everything else); but with large document
generation sometimes interpolation is just not feasible.

Ryan mentioned Java; the concatenation optimisation is exactly
analogous to a previous time in the same company I achieved a very
similar improvement by converting Java Strings to StringBuilders.

It’s still not interpolation, but it can have a genuine, measurable
effect. Knowing that + creates all those new instances while <<
doesn’t can be useful and practical knowledge.

Caveat: I’m pretty damned sure Ruby was not the right language to be
using on that project. One makes do with what one is given.

–
Matthew K., B.Sc (CompSci) (Hons)
http://matthew.kerwin.net.au/
ABN: 59-013-727-651

“You’ll never find a programming language that frees
you from the burden of clarifying your ideas.” - xkcd

janeden · June 21, 2012, 6:30am

On Wed, Jun 20, 2012 at 9:07 PM, Sam D. [email protected]
wrote:

Is that basically the same thing wrapped in another method so that counter
and o are yielded to a block?

Yes, and it’s part of Enumerable:

janeden · June 21, 2012, 5:23am

On Thu, Jun 21, 2012 at 8:49 AM, Avdi G. [email protected]
wrote:

I’m curious if you consider #each_with_object a reasonable choice for this.

indeed. i have 3 solns for this using 1) tap, 2) inject, 3)
each_with_object (or .each.with_object).

(Hash.new(0)).tap{|h| thingies.each{|i| h[i] += 1} }

thingies.inject(Hash.new(0)){|h,i| h[i] += 1; h} }

thingies.each.with_object(Hash.new(0)){|i,h| h[i] += 1}

looking at inject… hmm, not sure. it does not seem so bad… unless i
be so dogmatic… nah, i’ve multiple religion… it’s more fun

best regards -botp

janeden · June 22, 2012, 1:31am

On 21/06/2012, at 9:50 AM, Ryan D. wrote:

Given #inject’s other alias, #reduce, it is obvious that you don’t use #inject
for building up other objects. Even in a functional style of programming you’d
never see it building up anything. You’d see it REDUCING (folding) an object. If
#inject is applied in a non-folding manner, it isn’t functional, it is just dumb.
Don’t pretend otherwise (and if you do pretend otherwise, go read more books on
lisp–start with SICP). The second I see a semicolon (or return) in an inject, I
immediately suspect that someone is writing clevar/stupid code.

I don’t have any recent examples from the list, but I’m on record in multiple
mediums ranting against people who use #inject improperly. I’ll make up one based
on examples I’ve seen time and time again:

I come across this quite often, especially in Rails apps.

a = {:list => [1,2,3,4]}
b = {:list => [9,8,7,6,5]}

c = [a,b]

c.inject([]) {|memo, run| memo + run[:list] }

I always cringe when I see it but I haven’t found an alternative that is
as clear and concise.
collect and flatten looks ugly. I’d love to be able to do…

c.collect {|run| *run[:list]}

Henry

janeden · June 21, 2012, 8:21am

On Thu, Jun 21, 2012 at 6:28 AM, Avdi G. [email protected]
wrote:

On Wed, Jun 20, 2012 at 9:07 PM, Sam D. [email protected] wrote:

Is that basically the same thing wrapped in another method so that counter
and o are yielded to a block?

Yes, and it’s part of

Enumerable:Module: Enumerable (Ruby 1.9.3)

Enumerable also has

thingies.group_by(&:key)
thingies.group_by(&:key).map {|o,y| [o,y.length]}

Kind regards

robert

janeden · June 22, 2012, 2:32am

On 06/21/2012 04:30 PM, Henry M. wrote:

return) in an inject, I immediately suspect that someone is writing

Henry

You could move the array outside:

a = {:list => [1,2,3,4]}
b = {:list => [9,8,7,6,5]}

c = [a,b]

all = []

c.each { |h| all.concat h[:list] }

Saves a little memory, too?

-Justin

janeden · June 22, 2012, 9:37am

On Fri, Jun 22, 2012 at 7:30 AM, Henry M. [email protected]
wrote:

clear and concise.
collect and flatten looks ugly. I’d love to be able to do…

c.collect {|run| *run[:list]}

c.collect {|run| run[:list]} . flatten

or if there are only few elements,

a[:list] + b[:list]

kind regards -botp

janeden · June 22, 2012, 7:42am

Justin C. wrote in post #1065607:

Saves a little memory, too?

Quod erat demonstrandum.