Why doesn't Ruby "compile" strings?

Hi, the following code:


#!/usr/bin/ruby

require “benchmark”

HELLO_WORLD = “hello world”

1.upto(4) do

print "Benchmark using HELLO_WORLD: "
puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase } }

print "Benchmark using “hello_world”: "
puts Benchmark.realtime { 1.upto(500000) {|i| “hello_world”.upcase } }

print "Benchmark using ‘hello_world’: "
puts Benchmark.realtime { 1.upto(500000) {|i| ‘hello_world’.upcase } }

end

gives these results:


Benchmark using HELLO_WORLD: 1.1907217502594
Benchmark using “hello_world”: 1.53604388237
Benchmark using ‘hello_world’: 0.816991806030273
Benchmark using HELLO_WORLD: 0.599252462387085
Benchmark using “hello_world”: 0.814466714859009
Benchmark using ‘hello_world’: 0.812573194503784
Benchmark using HELLO_WORLD: 0.595503330230713
Benchmark using “hello_world”: 0.813859701156616
Benchmark using ‘hello_world’: 0.813681602478027
Benchmark using HELLO_WORLD: 0.594272136688232
Benchmark using “hello_world”: 0.815742254257202
Benchmark using ‘hello_world’: 0.811828136444092

Let’s take the last result so Ruby is “entirely loaded”:


Benchmark using HELLO_WORLD: 0.594272136688232
Benchmark using “hello_world”: 0.815742254257202
Benchmark using ‘hello_world’: 0.811828136444092

This clearly shows that using a constant string is faster than using a
string
writen into the script. So I wonder: why doesn’t Ruby “precompile”
internally
the strings appearing in the script?

This is, when Ruby interpreter is parsing the script and founds
“hello_world”,
couldn’t it create the string just once and keep in memory forever so
next
time same string is accessed Ruby doesn’t need to initiate it?
Is it imposible due to the design of Ruby?

PS: I don’t know if other languages (Python, PHP, Perlo…) do it or
not.

Iñaki Baz C. wrote:
[…]> This clearly shows that using a constant string is faster than
using a

string
writen into the script. So I wonder: why doesn’t Ruby “precompile”
internally
the strings appearing in the script?

Well, it would make garbage collection difficult.

This is, when Ruby interpreter is parsing the script and founds
“hello_world”,
couldn’t it create the string just once and keep in memory forever so
next
time same string is accessed Ruby doesn’t need to initiate it?
Is it imposible due to the design of Ruby?

It’s not impossible at all: just use symbols instead of strings.

PS: I don’t know if other languages (Python, PHP, Perlo…) do it or
not.

I don’t think PHP interns strings.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

El Domingo, 6 de Diciembre de 2009, Kirk H. escribió:

1.upto(4) do
puts Benchmark.realtime { 1.upto(500000) {|i| ‘hello_world’.upcase
the strings appearing in the script?

This is because when you are using the constant, you are referring to the
same object every time.

When you are using the string literals, the interpreter doesn’t know what
you are going to do with that string literal, so it’s not really safe for
it to assume that it can use a single ruby object to represent all
instances of it.

Why not? It’s obviously a string writen in the script, with no variables
into
it and so…

On Sat, Dec 5, 2009 at 6:48 PM, Iñaki Baz C. [email protected]
wrote:

This is because when you are using the constant, you are referring to
the
same object every time.

When you are using the string literals, the interpreter doesn’t know
what
you are going to do with that string literal, so it’s not really safe
for it
to assume that it can use a single ruby object to represent all
instances of
it. Consequently, it creates a new object each time.

So 500000.times { ‘foo’ } creates 500000 objects. That’s obviously
going to
take more time than FOO = ‘foo’; 500000.times { FOO } as that code just
looks up a constant 500000 times.

Kirk H.

Iñaki Baz C. wrote:

El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribió:

Is it imposible due to the design of Ruby?

It’s not impossible at all: just use symbols instead of strings.

What do you mean with symbols? do you mean using the following?:

puts Benchmark.realtime { 1.upto(500000) {|i|
:“hello_world”.to_s.upcase } }

I expect the same results as when converting the symbol to string (to_s)
Ruby
would generate a new string for each iteration, am I wrong?

Only one per iteration – ‘HELLO WORLD’. Your original implementation
would generate two new String objects for each iteration.

Thanks.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Iñaki Baz C. wrote:

El Domingo, 6 de Diciembre de 2009, Kirk H. escribió:

1.upto(4) do
puts Benchmark.realtime { 1.upto(500000) {|i| ‘hello_world’.upcase
the strings appearing in the script?

This is because when you are using the constant, you are referring to the
same object every time.

When you are using the string literals, the interpreter doesn’t know what
you are going to do with that string literal, so it’s not really safe for
it to assume that it can use a single ruby object to represent all
instances of it.

Why not? It’s obviously a string writen in the script, with no variables
into
it and so…

Irrelevant. Ruby strings are mutable, remember?

In other words, if I do
a = ‘hello’
b = ‘hello’
a.upcase!

then a is ‘HELLO’ while b is ‘hello’. This would not work as expected
if both ‘hello’ strings were the same object. It works with symbols
largely because symbols are immutable.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribió:

Is it imposible due to the design of Ruby?

It’s not impossible at all: just use symbols instead of strings.

What do you mean with symbols? do you mean using the following?:

puts Benchmark.realtime { 1.upto(500000) {|i|
:“hello_world”.to_s.upcase } }

I expect the same results as when converting the symbol to string (to_s)
Ruby
would generate a new string for each iteration, am I wrong?

Thanks.

On Dec 5, 9:29 pm, Marnen Laibow-Koser [email protected] wrote:

I expect the same results as when converting the symbol to string (to_s)
Best,

Marnen Laibow-Koserhttp://www.marnen.org
[email protected]

Posted viahttp://www.ruby-forum.com/.

No, two Strings get created here: one with .to_s and the other
with .upcase

On 06.12.2009 11:17, Brian C. wrote:

x.replace "rubbish"

a = “”.dup
a << “stuff”

or perhaps

a = String.new
a << “stuff”

Another thing you could not do with the auto interning (as with Java
String constants):

…each do |whatever|
s = “intro " << whatever << " outro”
store_away(s)
end

Ruby leaves the decision up to you where you want to optimize while
still keeping things nice for other use cases. The code above would
have to look like this if “” and ‘’ would not construct new objects:

…each do |whatever|
s = "intro “.dup << whatever << " outro”
store_away(s)
end

Now, that looks worse IMHO.

Kind regards

robert

PS: Note also that all strings created via “” and ‘’ do share the
internal character buffer until one of them is modified (copy on write)
so it could be more inefficient as it actually is. :slight_smile:

Iñaki Baz C. wrote:

When you are using the string literals, the interpreter doesn’t know what
you are going to do with that string literal, so it’s not really safe for
it to assume that it can use a single ruby object to represent all
instances of it.

Why not? It’s obviously a string writen in the script, with no variables
into
it and so…

Suppose you do:

10.times do
puts “hello”
end

The interpreter has no way of knowing that puts does not mutate the
argument passed to it. This is a silly example, but you might have done:

alias :old_puts :puts
def puts(x)
old_puts x
x.replace “rubbish”
end

So it is forced to create a new string object each time round the loop.

It’s a shame that Ruby doesn’t have immutable strings. Symbols are the
closest, but they have different semantics to strings.

If the literal syntax “xxx” gave a frozen String, then it would be
safe to re-use it. But then if you wanted to append to a string, you’d
have to write:

a = “”.dup
a << “stuff”

or perhaps

a = String.new
a << “stuff”

Regards,

Brian.

El Domingo, 6 de Diciembre de 2009, Robert K.
escribió:> >

def puts(x)
safe to re-use it. But then if you wanted to append to a string, you’d
Another thing you could not do with the auto interning (as with Java

PS: Note also that all strings created via “” and ‘’ do share the
internal character buffer until one of them is modified (copy on write)
so it could be more inefficient as it actually is. :slight_smile:

Ok, thanks a lot to all for so good explanations. 100% understood now :slight_smile:

pharrington wrote:

On Dec 5, 9:29�pm, Marnen Laibow-Koser [email protected] wrote:

I expect the same results as when converting the symbol to string (to_s)
Best,
–�
Marnen�Laibow-Koserhttp://www.marnen.org
[email protected]

Posted viahttp://www.ruby-forum.com/.

No, two Strings get created here: one with .to_s and the other
with .upcase

Quite right. What the hell was I thinking? :slight_smile:

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]