Ruby internals & other questions

ralphshnelvar · November 25, 2009, 5:01pm

Is there a document or website that describes how Ruby works?

For instance …

y=0
1_000_000.times {|x| y+=x}

(1) Does the block get compiled a million times?

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?

(4) In IRB, whats the best way to time the code, above?

ralphshnelvar · November 25, 2009, 5:24pm

Ralph S. wrote:

Is there a document or website that describes how Ruby works?

Well, every interpreter could be implemented a bit differently. There
is a spec, although I don’t know a URL offhand.

For instance …

y=0
1_000_000.times {|x| y+=x}

(1) Does the block get compiled a million times?

Certainly not! The block is an object, and is passed as an argument to
the times function.

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

Gauss’s method:
def triangular(n)
(n + 1) * n / 2
end

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?

.exe? What are you, one of those weird Windows users?

I don’t see why there would be a difference – it’s the same
interpreter. But why not test it?

(4) In IRB, whats the best way to time the code, above?

I think there’s a benchmark library or something like that.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

ralphshnelvar · November 25, 2009, 5:26pm

On Nov 25, 2009, at 11:01 AM, Ralph S. wrote:

Is there a document or website that describes how Ruby works?

Yes, plenty. You could start with ruby-lang.org or ask Google for
JRuby, Rubinius, MacRuby, Maglev, or IronRuby and seek out the source
code yourself.

(1) Does the block get compiled a million times?
Well, “compiled” makes the answer tricky, but the answer is basically
“no.” It gets executed a million times, but the compilation or AST
production will only happen once (ignoring what JRuby’s JIT might do).

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000
Use a formula and don’t actually do a “sum”. (Which is true in any
language)

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?
Probably not significant.

(4) In IRB, whats the best way to time the code, above?
Benchmark it. (part of Ruby’s standard library)

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

ralphshnelvar · November 25, 2009, 5:29pm

2009/11/25 Ralph S. [email protected]:

(1) Does the block get compiled a million times?

Of course not.

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

Same as in other languages:

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?

Try it out. Benchmark is your friend.

(4) In IRB, whats the best way to time the code, above?

irb(main):004:0> require ‘benchmark’
=> true
irb(main):005:0> Benchmark.measure { sleep 2 }
=> #<Benchmark::Tms:0x100d2068 @label=“”, @real=2.0, @cstime=0.0,
@cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>
irb(main):006:0> Benchmark.measure { sleep 2 }.to_s
=> " 0.000000 0.000000 0.000000 ( 2.000000)\n"
irb(main):007:0>

Cheers

robert

ralphshnelvar · November 25, 2009, 5:29pm

Ralph S. wrote:

y=0
1_000_000.times {|x| y+=x}

y = (1…1_000_000).inject { |a, b| a + b }

More idiomatic, though maybe yours is easier to read at first.
If you’re not sure what it does, read the rdoc, then ask us questions
(rdocs aren’t -always- crystal clear).
To compare which one is faster, a google search with the keywords :
“ruby, benchmark, bmbm” will help.

To time the code, the simplest method (though not the most elegant) is
to do this:
#beginning of code
t0 = Time.now
#all of code
puts “This took #{Time.now - t0} seconds.”

ralphshnelvar · November 25, 2009, 5:40pm

On Wed, Nov 25, 2009 at 9:01 AM, Ralph S. [email protected]
wrote:

Is there a document or website that describes how Ruby works?

For instance …

You have the source. That’s often the best way to really understand how
a
given implementation works.

y=0
1_000_000.times {|x| y+=x}

(1) Does the block get compiled a million times?

No. The block’s source is only evaluated once. Different
implementations
will represent it differently, but the end result is that the block
becomes
an evaluated thing that will then be executed a million times.

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

Math is your friend. The sum of a string of integers from 1 to n is:
(n**2

n)/2

def sum(n)
(n**2 + n) / 2
end

However, in that code you wrote, you are actually summing the numbers
from
zero to 999999. Fixnum#times starts at 0. You probably actually want
something like 1.upto(1000000). Math is still the best way to do it,
though.

(3) Is there a difference in speed between IRB.exe and ruby.exe in

executing the above code?

Not really. irb is just a bunch of code to facilitate a sort of ruby
shell. The code is all still be executed by Ruby, though.

(4) In IRB, whats the best way to time the code, above?

Read up on the ‘benchmark’ library. There are several ways to use it.

One way:

require ‘benchmark’

Benchmark.bm {|bm| bm.report {y = 0; 1000000.times {|n| y += n}}}

Kirk H.

ralphshnelvar · November 25, 2009, 5:43pm

On Nov 25, 2009, at 11:29 AM, Aldric G. wrote:

To compare which one is faster, a google search with the keywords :
“ruby, benchmark, bmbm” will help.

…but not equivalent:
y=0
1_000_000.times {|x| y+=x }

y = (0…999_999).inject {|a,b| a+b}

Although I wouldn’t necessarily call this use of inject idiomatic.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

ralphshnelvar · November 25, 2009, 5:49pm

On Wed, Nov 25, 2009 at 9:29 AM, Aldric G.
[email protected]wrote:

Ralph S. wrote:

y=0
1_000_000.times {|x| y+=x}

y = (1…1_000_000).inject { |a, b| a + b }

More idiomatic, though maybe yours is easier to read at first.

Ugh. His, while wrong for what he is trying to do (sum from 1 to
1000000)
is vastly superior to using inject like that. It’s not idiomatic. It’s
obtuse.

If one really wants to figure it out iteratively:

y = 0; 1.upto(1000000) {|x| y += x}

or

y = 0; (1…1000000).each {|x| y += x}

Are both easier to read at first, at second, and at 1000000 viewings
than
using inject is. Additionally, inject has no advantage with regard to
either execution speed or object creation (less object creation is
generally
better). There is no point in using it in a case like this. Inject is
whiz-bang cool, and sometimes seems like an elegant solution, but it
usually
makes code slower and harder to read when people use it.

Kirk H.

ralphshnelvar · November 25, 2009, 5:51pm

On Wednesday 25 November 2009 10:01:02 am Ralph S. wrote:

Is there a document or website that describes how Ruby works?

You’re going to have to get a lot more specific.

y=0
1_000_000.times {|x| y+=x}

(1) Does the block get compiled a million times?

Implementation-specific, but I doubt it.

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

What do you mean by “best”? Your way probably won’t work, by the way –
it
will count from 0 to 999_999, not from 1 to 1_000_000.

This is probably the most idiomatic way:

(1…1_000_000).inject(&:+)

But if you mean the fastest way, I would guess it would be something
like
this, in pure ruby:

y = 0
i = 0
while i < 1_000_000
i += 1
y += i
end

Realistically, though, Ruby probably isn’t the best language. Inline C
might
be better:

require ‘inline’
class Foo
inline do |builder|
builder.c <<-END
long sum(long max) {
long result = 0;
long i;
for(i=1; i<=max; i++) {
result += i;
}
return result;
}
END
end
end
puts Foo.new.sum(1_000_000)

To be fair, this takes longer on my system, but I think that’s because
the C
compiler is being run each time. I’m sure there’s a way to avoid that,
but I
haven’t looked. This is also going to be much more difficult for you on
Windows than, well, any other platform.

But you should keep some things in mind – this is a really arbitrary
benchmark, of the sort that you’d never actually use in real code.
Try this instead:

n = 1_000_000
n*(n+1)/2

The way to be faster in any language is to improve your algorithm – and
your
algorithm is much more likely to be a bottleneck than the language in
question. That’s why I use Ruby in the first place.

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?

Maybe. If both are from the same version of Ruby, there shouldn’t be
anything
significant. You could test it, though.

(4) In IRB, whats the best way to time the code, above?

The simplest way is:

require ‘benchmark’
require ‘foo’ # if you do the RubyInline example I gave
Benchmark.bm do |x|
x.report { Foo.new.sum(1_000_000) }
x.report { y = 0; 1_000_001.times {|x| y += x} }
x.report { (1…1_000_000).inject(&:+) }
x.report { n = 1_000_000; n*(n+1)/2 }
end

That’ll work anywhere, though it’s going to be a bit cumbersome in irb.
Someone else may have a “best” way.

I haven’t run this test, though. I have no plans to, unless someone
really
wants to claim that any of the loops are faster than those three integer
operations.

ralphshnelvar · November 25, 2009, 6:07pm

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

MLK> Gauss’s method:
MLK> def triangular(n)
MLK> (n + 1) * n / 2
MLK> end

Ugh … you know that is not what I meant!

ralphshnelvar · November 25, 2009, 6:04pm

On Wednesday 25 November 2009 10:49:31 am Kirk H. wrote:

is vastly superior to using inject like that. It’s not idiomatic. It’s
obtuse.

I find this is actually significantly easier to read, though it’s
probably
because I’ve been doing it for awhile:

(1…1_000_000).inject(&:+)

I’d much rather have a #sum method on enumerable, but that’s almost as
concise, though it takes a bit to explain why it works.

But even spelling it out, it’s pretty clear if you use descriptive
variables:

(1…1_000_000).inject{|sum, i| sum + i}

I don’t think that’s less readable, except for the fact that you have to
understand how inject works.

inject has no advantage with regard to
either execution speed or object creation

But it does have the theoretical advantage of fitting exactly the
map/reduce
pattern – inject is reduce, by definition and by alias. It’s overkill
here,
and I’m probably over my head, but my understanding of why map/reduce is
efficient:

In theory, map lets you spread your dataset to up to n machines, where n
is
the number of items in your dataset, and let each machine perform
whatever
calculation was called for in the map. Once you’ve already done that,
reduce
makes sense – have each machine perform the reduce (inject) function,
passing
the result to the next machine, rather than having to aggregate the
result of
the map into a single location.

In reality, none of this really applies to Ruby, at least not to the
standard
map/inject methods. But it’s worth thinking about, and probably good
practice
for the manycore monstrosities of the future.

ralphshnelvar · November 25, 2009, 6:11pm

Ralph S. wrote:

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

MLK> Gauss’s method:
MLK> def triangular(n)
MLK> (n + 1) * n / 2
MLK> end

Ugh … you know that is not what I meant!

Ralph, you do have to be careful. I got yelled at for answering the
questions you meant to ask because they weren’t the answers to the
questions you did ask

ralphshnelvar · November 25, 2009, 6:51pm

On 25.11.2009 17:29, Aldric G. wrote:

Ralph S. wrote:

y=0
1_000_000.times {|x| y+=x}

y = (1…1_000_000).inject { |a, b| a + b }

There is an issue with this approach. The proper implementation for the
general case of summing any number of values would look like this:

y = (1…1_000_000).inject(0) { |a, b| a + b }

Kind regards

robert

PS: Yes, I deliberately left out the explanation what the issue is.

ralphshnelvar · November 25, 2009, 6:50pm

Ralph S. wrote:

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

MLK> Gauss’s method:
MLK> def triangular(n)
MLK> (n + 1) * n / 2
MLK> end

Ugh … you know that is not what I meant!

Why not? You’d rather do that iteratively? What kind of masochist are
you?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

ralphshnelvar · November 25, 2009, 7:17pm

Ralph S. wrote:

Next time I’ll be more of a masochist and lay out a more arbitrary set
of criteria so that the focus of what I am trying to understand moves
elsewhere.

Actually, and with all friendly intent, please read this:
http://catb.org/~esr/faqs/smart-questions.html
It’s a pretty interesting read. Since I’ve read it, it’s occurred
several times that the following happens to me:

I have a problem/question to which I can’t find the answer
I go to a forum / newsgroup and start to write the problem.
In the course of laying out the problem, I think of ways to explain
exactly the problem (how it occurs, how to repeat it, etc)
I find the solution, close the tab, and continue working.

If you want the right answer, ask the right question. It can be a bit
tricky at first, but it’s quite a useful skill.

ralphshnelvar · November 25, 2009, 7:05pm

MLK> Ralph S. wrote:

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000

MLK> Gauss’s method:
MLK> def triangular(n)
MLK> (n + 1) * n / 2
MLK> end

Ugh … you know that is not what I meant!

MLK> Why not? You’d rather do that iteratively? What kind of masochist
are
MLK> you?

Next time I’ll be more of a masochist and lay out a more arbitrary set
of criteria so that the focus of what I am trying to understand moves
elsewhere.

ralphshnelvar · November 28, 2009, 10:22pm

Ah?

realtime { p (1…1_000_000).inject {|s,e| s + e} }
=> 0.15593600273132324

realtime { p (1…1_000_000).inject(&:+) }
=> 0.1163489818572998

realtime { p (1…1_000_000).inject(:+) }
=> 0.08730292320251465

realtime { y=0; (1…1_000_000).each {|x| y+=x}; p y }
=> 0.12649822235107422

realtime { y=0; 1.upto(1_000_000) {|x| y+=x}; p y }
=> 0.12580513954162598

realtime { y=0;i=0; while(i<1_000_000); i+=1;y+=i; end; p y }
=> 0.05157589912414551

realtime { y=0; for i in (1…1_000_000); y+=i; end; p y }
=> 0.13959503173828125

Seem the inject(:+), which is a bit less idiomatic, is clearly better to
do
this.
And clearly inject is fast and good for memory I think (no outside local
variables needed, that’s its power).

The while loop is the fastest, while completely looking awful.

(1) Does the block get compiled a million times?
No

(2) What’s the best Ruby way to do a sum from 1 to 1_000_000
=> The most “Ruby way” is inject to sum values in an Array

(3) Is there a difference in speed between IRB.exe and ruby.exe in
executing the above code?
Let’s see:
IRB > realtime { p (1…1_000_000).inject(:+) }
=> 0.08730292320251465
RUBY > ruby test.rb
0.08849906921386719

It’s the same. IRB even look better here.

(4) In IRB, whats the best way to time the code, above?
require “benchmark”
include Benchmark
p realtime { p (1…1_000_000).inject(:+) }

If you more details, look Benchmark module.
require “benchmark”
include Benchmark
bm { |b|
b.report(“mytest”) { (1…1_000_000).inject(:+) }
}

2009/11/25 Kirk H. [email protected]