WHILE loops are so much faster. Why?

Detlef_R · January 3, 2014, 7:18am

Testing old code in MRI 2.1.0 I noticed that using WHILE loops in place
of other looping structures reduced times in some algorithms from
14%-20% or greater, as the iterations increased.

However, running the same code in Rubinius 2.2.2 and JRuby 1.7.9 showed
no appreciable difference between the WHILE loop heavy code versus using
TIMES and EACH, etc.

I had noticed this performance difference in 2.0.0 too, but it seems in
2.1.0 it’s increased.

Since it seems WHILE loops are so much faster, why doesn’t Ruby parse
the other loop structures into WHILE loop semantics to get the same
increased performance?

Examples:

n.times do |i| … end becomes i=0; while i < n; …; i +=1 end

x.step(lmt,i) do |j| …end becomes j=x; while j < lmt; …; j += i end

and so on for EACH, et al.

jabari_z · January 3, 2014, 2:55pm

On Fri, Jan 3, 2014 at 7:18 AM, Jabari Z. [email protected] wrote:

Testing old code in MRI 2.1.0 I noticed that using WHILE loops in place
of other looping structures reduced times in some algorithms from
14%-20% or greater, as the iterations increased.

Please post your test code (as Gist or whatever).

However, running the same code in Rubinius 2.2.2 and JRuby 1.7.9 showed
no appreciable difference between the WHILE loop heavy code versus using
TIMES and EACH, etc.

I had noticed this performance difference in 2.0.0 too, but it seems in
2.1.0 it’s increased.

Since it seems WHILE loops are so much faster, why doesn’t Ruby parse
the other loop structures into WHILE loop semantics to get the same
increased performance?

It cannot, because the parser does not know what methods #times and
#step do.

Examples:

n.times do |i| … end becomes i=0; while i < n; …; i +=1 end

x.step(lmt,i) do |j| …end becomes j=x; while j < lmt; …; j += i end

and so on for EACH, et al.

Generally the iteration behavior is hidden inside the class for good
reasons. Your optimization would not work for code like this

def work(enum)
enum.each {|x| p x}
end

work [1,2,3]
work SomeClass.new

You cannot optimize the loop in method work since you do not know what
gets passed.

Kind regards

robert

jabari_z · January 3, 2014, 4:01pm

You have to back it up with proper benchmarks. A screenshot would even
be
better. Make sure you run it twice to enable proper caching and which
proper synthetic benchmarks generally do as the system may actually be
doing something else at the time. It’s not a bad idea to read about
finite
state machines and automata theory as well as the substitution model
before
you hit the philosophical “why doesn’t X language use Y semantics”
without
actual proof. While your benching don’t forget to add a proper recursion
example. It would be a shame not to hit every way we can transverse and
build new structures with our data.

jabari_z · January 7, 2014, 7:43pm

On Thu, Jan 2, 2014 at 10:18 PM, Jabari Z. [email protected] wrote:

Since it seems WHILE loops are so much faster, why doesn’t Ruby parse
the other loop structures into WHILE loop semantics to get the same
increased performance?

The other looping structures all utilize blocks, which make a new scope
each time. “while” reuses the same scope.

jabari_z · January 7, 2014, 7:27pm

Stu wrote in post #1132125:

You have to back it up with proper benchmarks. A screenshot would even
be
better. Make sure you run it twice to enable proper caching and which
proper synthetic benchmarks generally do as the system may actually be
doing something else at the time. It’s not a bad idea to read about
finite
state machines and automata theory as well as the substitution model
before
you hit the philosophical “why doesn’t X language use Y semantics”
without
actual proof. While your benching don’t forget to add a proper recursion
example. It would be a shame not to hit every way we can transverse and
build new structures with our data.

Here is sample code that illustrates different loop performance.

class Integer

def soe0
# enhanced classical brute-force Sieve of Eratosthenes (SoE)
lndx = (self-1)>>1; sieve = Array.new(lndx,true)
0.step((Math.sqrt(self).to_i-3)/2) do |i|
next unless sieve[i]
((i*(i+3)2)+3).step(lndx, (i2)+3) { |j| sieve[j] = nil }
end
return [2] if self < 3
primes=[2]; 0.step(lndx) { |k| primes << (k*2)+3 if sieve[k] }
primes
end

def soe1
# enhanced classical brute-force Sieve of Eratosthenes (SoE)
lndx = (self-1)>>1; sieve = Array.new(lndx,true)
0.step((Math.sqrt(self).to_i-3)/2) do |i|
next unless sieve[i]
psn = ((i*(i+3)2)+3); prmstep = (i2)+3
while psn < lndx; sieve[psn]=nil; psn += prmstep end
end
return [2] if self < 3
primes=[2]; 0.step(lndx) { |k| primes << (k*2)+3 if sieve[k] }
primes
end

def soe2
# enhanced classical brute-force Sieve of Eratosthenes (SoE)
lndx = (self-1)>>1; sieve = Array.new(lndx,true)
0.step((Math.sqrt(self).to_i-3)/2) do |i|
next unless sieve[i]
psn = ((i*(i+3)2)+3); prmstep = (i2)+3
while psn < lndx; sieve[psn]=nil; psn += prmstep end
end
return [2] if self < 3
primes=[2]; k=0
while k < lndx; primes << (k*2)+3 if sieve[k]; k+=1 end
primes
end

def soe3
# enhanced classical brute-force Sieve of Eratosthenes (SoE)
lndx = (self-1)>>1; sieve = Array.new(lndx,true); i=0
while i < (Math.sqrt(self).to_i-3)/2
if sieve[i]
psn = ((i*(i+3)2)+3); prmstep = (i2)+3
while psn < lndx; sieve[psn]=nil; psn += prmstep end
end
i +=1
end
return [2] if self < 3
primes=[2]; k=0
while k < lndx; primes << (k*2)+3 if sieve[k]; k+=1 end
primes
end
end

require ‘benchmark’
limit= 50_000_001
Benchmark.bm(30) do |t|
puts “comparison of soe using Step and While loops”
t.report(“soe with 3 Step & 0 While loops”) do ret0 = limit.soe0 end
t.report(“soe with 2 Step & 1 While loops”) do ret1 = limit.soe1 end
t.report(“soe with 1 Step & 2 While loops”) do ret2 = limit.soe2 end
t.report(“soe with 0 Step & 3 While loops”) do ret3 = limit.soe3 end
end

System:
Lenovo V570 laptop, I5 2.3 GHz, 6 GB, PCLOS Linux kernel 3.4.72

Here are results from a typical run on Ruby-2.1.0p0.
Sometimes the last example is a little bit less/greater than the 3rd.
So converting the main loop to a While inside a While has least affect.
Rubinius 2.2.2 and JRuby 1.7.9 give different results

2.1.0p0 :007 > load ‘Primes/soe.rb’
user system total
real
comparison of soe using Step and While loops
soe with 3 Step & 0 While loops 6.740000 0.100000 6.840000 (6.936593)
soe with 2 Step & 1 While loops 4.770000 0.070000 4.840000 (4.915530)
soe with 1 Step & 2 While loops 4.100000 0.030000 4.130000 (4.190563)
soe with 0 Step & 3 While loops 4.190000 0.080000 4.270000 (4.365184)

jabari_z · January 7, 2014, 7:55pm

I find it amazing how different the benchmarks were using the various
versions of Ruby. JRuby was the slowest and MacRuby the fastest with rbx
not too far behind. The stock MRI Ruby fell in the middle. Very
interesting.

Wayne

jabari_z · January 7, 2014, 9:33pm

Wayne B. wrote in post #1132443:

I find it amazing how different the benchmarks were using the various
versions of Ruby. JRuby was the slowest and MacRuby the fastest with rbx
not too far behind. The stock MRI Ruby fell in the middle. Very
interesting.

Wayne

On my system, with rvm installed Rubies, MRI ruby-2.1.0p0 is by far the
fastest, then Rubinius rbx-2.2.2, and last JRuby-1.7.9. In fact jruby
hung on the 3rd test (I assume memory issue) and I aborted it.

While I saw from ruby-1.9 that WHILE loops were relatively faster, they
seem to be relatively even faster under 2.1.0.

I shared this information because I don’t believe I have ever seen a
comprehensive discussion, tests, benchmarks which document that WHILE
loops have performance benefits under certain conditions and that if
speed is an issue then people should at least modify their code and test
it to see if using them is faster. For most code the more idiomatic
structures (times, step, each, etc) will be sufficient (and more
elegant) but people should at least be made aware of the differences and
benefits.

And it -is- interesting that running this same code on different Ruby
VMs will create very different (not consistent) results.

jabari_z · January 7, 2014, 11:11pm

On Tue, Jan 7, 2014 at 12:33 PM, Jabari Z. [email protected] wrote:

I shared this information because I don’t believe I have ever seen a
comprehensive discussion, tests, benchmarks which document that WHILE
loops have performance benefits under certain conditions and that if
speed is an issue then people should at least modify their code and test
it to see if using them is faster.

That’s because it’s a microoptimization that is typically cancelled out
by
the time spent in the body of the loop. If you’re doing anything
nontrivial
within the loop body, the performance benefits of switching to while are
negligible.

jabari_z · January 7, 2014, 8:07pm

What Tony said. The other looping structures are not equivalent because
a
WHILE loop runs the iterations within a single scope while the others
(times, step, each, etc…), because they utilize blocks, run each
iteration in a new scope. It is that additional overhead which is being
seen in the time differentials in your benchmarks.

Kirk H.

jabari_z · January 8, 2014, 3:35am

Abinoam Jr. wrote in post #1132463:

I completly agree with Tony A…
The thing is even less important If the code inside the block is
I/O-bound.
There will be no difference at all.

But there’s a more important implication in my humble opinion.
Desing tests for while loops are harder.
You cannot (easily) mock/stub anything around it.
With methods and blocks I can easily make tests to guarantee that an
specific method is yielding an expected object.
Also, I can ensure that the block has some side effect.
Or I can test if the block is returning some kind of value.
So I’ve been seeing code all over that break the logic more and more
in separate small chunks of codes on methods so that testing could be
easier.
Some people prefer this even if it gives a small performance penalty.
But, this allegations of a performance penalty generally fall to the
ground when real code is benchmarked.

Abinoam Jr.

Well, my point is I think it should be documented, for the benefits of
all users, that WHILE loops can potentially make some code faster. After
all, Matz included them natively in the language, so he assumedly
envisioned a use for them, and I have shown they -can- make code faster.

From my point of view, I do scientific and math related algorithms, and
I’m looking for speed (in Ruby), so a reduction from 7 to 4 seconds (43%
reduction) is significant, especially when I don’t have to change the
algorithm but merely need to implement it with different instructions.

Please appreciate that this sort of information is REALLY important to
me. And this kind of information can make Ruby more attractive to people
doing scientific, math, and numerical/algorithmic programming.

As I said, for most cases WHILE loops wouldn’t be the first choice and
the other more idiomatic structures SHOULD first be used, just to get
code to work. However, this implementation tweek should be stated in the
books and documentation, because it provides a tangible difference that
can make WHILE loops the best implementation choice.

jabari_z · January 8, 2014, 1:13am

I completly agree with Tony A…
The thing is even less important If the code inside the block is
I/O-bound.
There will be no difference at all.

But there’s a more important implication in my humble opinion.
Desing tests for while loops are harder.
You cannot (easily) mock/stub anything around it.
With methods and blocks I can easily make tests to guarantee that an
specific method is yielding an expected object.
Also, I can ensure that the block has some side effect.
Or I can test if the block is returning some kind of value.
So I’ve been seeing code all over that break the logic more and more
in separate small chunks of codes on methods so that testing could be
easier.
Some people prefer this even if it gives a small performance penalty.
But, this allegations of a performance penalty generally fall to the
ground when real code is benchmarked.

Abinoam Jr.

jabari_z · January 8, 2014, 3:41am

Abinoam Jr. wrote in post #1132463:

I completly agree with Tony A…
The thing is even less important If the code inside the block is
I/O-bound.
There will be no difference at all.

But there’s a more important implication in my humble opinion.
Desing tests for while loops are harder.
You cannot (easily) mock/stub anything around it.
With methods and blocks I can easily make tests to guarantee that an
specific method is yielding an expected object.
Also, I can ensure that the block has some side effect.
Or I can test if the block is returning some kind of value.
So I’ve been seeing code all over that break the logic more and more
in separate small chunks of codes on methods so that testing could be
easier.
Some people prefer this even if it gives a small performance penalty.
But, this allegations of a performance penalty generally fall to the
ground when real code is benchmarked.

Abinoam Jr.

Well, my point is I think it should be documented, for the benefits of
all users, that WHILE loops can potentially make some code faster. After
all, Matz included them natively in the language, so he assumedly
envisioned a use for them, and I have shown they -can- make code faster.

From my point of view, I do scientific and math related algorithms, and
I’m looking for speed (in Ruby), so a reduction from 7 to 4 seconds (43%
reduction) is significant, especially when I don’t have to change the
algorithm but merely need to implement it with different instructions.

Please appreciate that this sort of information is REALLY important to
me. And this kind of information can make Ruby more attractive to people
doing scientific, math, and numerical/algorithmic programming.

As I said, for most cases WHILE loops wouldn’t be the first choice and
the other more idiomatic structures SHOULD first be used (as I did),
just to get code to work. However, this implementation tweek should be
stated in the books and documentation, because it provides a tangible
difference that can make WHILE loops the best implementation choice.

jabari_z · January 8, 2014, 3:45am

From my point of view, I do scientific and math related algorithms, and
I’m looking for speed (in Ruby), so a reduction from 7 to 4 seconds (43%
reduction) is significant, especially when I don’t have to change the
algorithm but merely need to implement it with different instructions.

Thanks for the benchmarks and demonstration. Numerical codes are compute
intensive,
and therefore I think your view point is valid and this demo very
useful.

Thanks again,
saji

Saji N Hameed,
ARC-ENV, Center for Advanced Information Science and Technology,
University of Aizu, Tsuruga, Ikki-machi,
Aizuwakamatsu-shi, Fukushima 965-8580,
Japan

Tel: +81242 37-2736
Fax:+81242 37-2760
email: [email protected]
url: http://enformtk.u-aizu.ac.jp
bib: Web of Science
code: sajinh (Saji Hameed) · GitHub