Why does Ruby do 'str_replace' when storing values in an hash?

Dear list,

some days ago I wrote a script that stored values in an Hashtable with
different buckets. It was some string-parsing grunt work.

I noticed that Ruby became pretty slow when the hashtable grew to ~10
Mio
entries. To make sure it is actually Ruby that is to blame, I wrote
comparison
script snippets in Perl and Python, and yes, Ruby was just plain slower.

By reducing the amount of code step-by-step I was able to narrow the
problem
down. Currently, I can reproduce the behaviour with the following simple
scriptlet:

—%<—
tbl = { }

10_000_000.times do |i|
tbl[‘last’] = iii
end
—>%—

Execution times are as follows:

  • ruby loop0r.rb
    real 0m10.741s
    user 0m10.596s
    sys 0m0.107s

  • perl loop0r.pl
    real 0m4.503s
    user 0m4.420s
    sys 0m0.048s

  • python loop0r.py
    real 0m6.704s
    user 0m6.367s
    sys 0m0.274s

Replacing the String I use as Hash key with a symbol, i.e., :last,
lowers the
execution time of Ruby dramatically so that it becomes fast than Python
(and
still slower than Perl, but that’s ok).

I was baffled and ran callgrind against the interpreter. It came up with
this:

—%<—
183,248,191 /usr/local/rvm/src/ruby-2.0.0-p247/vm.inc:vm_exec_core’2
155,033,405
/usr/local/rvm/src/ruby-2.0.0-p247/siphash.c:ruby_sip_hash24
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
112,110,714 /usr/local/rvm/src/ruby-2.0.0-p247/insns.def:vm_exec_core’2
93,170,146 /usr/local/rvm/src/ruby-2.0.0-p247/string.c:str_replace
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
78,086,663 /usr/local/rvm/src/ruby-2.0.0-p247/st.c:st_update
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
74,991,856 /usr/local/rvm/src/ruby-2.0.0-p247/gc.c:slot_sweep
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
73,137,809 /usr/local/rvm/src/ruby-2.0.0-
p247/vm_insnhelper.c:vm_call_cfunc_with_frame’2
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
63,043,562
/usr/local/rvm/src/ruby-2.0.0-p247/vm_insnhelper.c:vm_push_frame
[/home/eveith/.rvm/rubies/ruby-2.0.0-p247/lib/libruby.so.2.0.0]
61,029,008 /usr/local/rvm/src/ruby-2.0.0-p247/vm.c:rb_yield
—>%—

Now: Why does Ruby call string.c:str_replace? There is obviously no
operation
that modifies a string here. In fact, I even use the same String as hash
key
over and over again.

Of course, I can also freeze the String, which amounts to the same as
using a
Symbol (performance-wise). Still, I don’t understand why Ruby is calling
str_replace.

What is happening here? And how can I circumvent it?

Thanks alot for any replies!

  --- Eric

str_replace is (probably, I’m not an expert here) the function that
replaces a string object’s value, not to be confused with PHP’s
str_replace
(which in ruby is called gsub).

Objects created in ruby blocks (including literals) are created each
time
the block is run. Have you tried this?

—%<—
tbl = { }
key = ‘last’

10_000_000.times do |i|
tbl[key] = iii
end
—>%—

I imagine that will have the same effect as either the symbol (which is
a
singleton), or the frozen string (which is effectively a singleton). No
new
string literal = no new string object = no replacing the existing one.

On Wednesday 09 April 2014, 00:41:58, Matthew K. wrote:

str_replace is (probably, I’m not an expert here) the function that
replaces a string object’s value, not to be confused with PHP’s str_replace
(which in ruby is called gsub).

Oh yes, you are right of course. No wonder I was mistaken with my own
analysis. Everything makes perfect sense now. :slight_smile:

Objects created in ruby blocks (including literals) are created each time
the block is run. Have you tried this? […]

Yep, that helps and puts ruby down to 3s.

However, I’m wondering how this will help me with the actual
string-parsing
grunt work. I don’t know the hash keys beforehand, so its still strings
that
are used. Is there a more efficient way to do this instead of using the
Hash
class?

Thanks alot!

  --- Eric

On Apr 9, 2014 4:40 AM, “Eric MSP Veith” [email protected]
wrote:

However, I’m wondering how this will help me with the actual
string-parsing
grunt work. I don’t know the hash keys beforehand, so its still strings
that
are used. Is there a more efficient way to do this instead of using the
Hash
class?

I think it’s not the hash, exactly; it’s keying the hash on a lot of
string
objects.

OTOH you might be able to use a second hash, something like:

keys = Hash.new{|h,k|h[k]=h.length}
data = {}

#…big loop
data[ keys[my_str] ] = my_val

Caveat: written on my phone, not verified at all.

The goal is to convert the string to a Fixnum in the simplest way
possible.
(Fixnums also being singleton.) Since keys only updates to add a new
string, it shouldn’t suffer the same slowdown.

Eric MSP Veith [email protected] wrote:

Yep, that helps and puts ruby down to 3s.

Btw, if you do know the strings beforehand, ruby-trunk will prefreeze
and deduplicate string literals when doing hash assignments.

However, I’m wondering how this will help me with the actual string-parsing
grunt work. I don’t know the hash keys beforehand, so its still strings that
are used. Is there a more efficient way to do this instead of using the Hash
class?

I’ll try to make dedupe+prefreeze for dynamic strings possible again in
2.2, but no guarantees. We tried it right before 2.1, but there was a
regression in some cases so we reverted it.

On Tue, Apr 8, 2014 at 8:39 PM, Eric MSP Veith
[email protected] wrote:

Yep, that helps and puts ruby down to 3s.

However, I’m wondering how this will help me with the actual string-parsing
grunt work. I don’t know the hash keys beforehand, so its still strings that
are used. Is there a more efficient way to do this instead of using the Hash
class?

Ruby will copy a String instance when used as a Hash key to avoid
aliasing effects. There is an easy trick to avoid this copying: freeze
them before inserting

irb(main):001:0> s = “foo”
=> “foo”
irb(main):002:0> h = {}
=> {}
irb(main):003:0> h[s]=1
=> 1
irb(main):004:0> h.keys.map {|k| s.equal? k}
=> [false]

irb(main):005:0> s.freeze
=> “foo”
irb(main):006:0> h = {}
=> {}
irb(main):007:0> h[s]=2
=> 2
irb(main):008:0> h.keys.map {|k| s.equal? k}
=> [true]

Note: #equal? tests for identity - not equivalence.

Kind regards

robert