What is the difference between :foo and "foo"?

Surgeon · December 28, 2005, 11:15pm

James G. wrote:

As one of the people guilty of saying what that article says we
shouldn’t, I better try to get back in Jim’s good graces by answering
this one…

James, you have never been out of my good graces (despite how you
describe symbols)

–
– Jim W.

Surgeon · December 28, 2005, 11:12pm

ara wrote:

i see this claim all the time but never data supporting it, all my test
programs have shown the opposite to be true.

My benchmark shows symbols to have a slight edge. I’ve attached my
benchmark at the end for review. (Benchmarks are tricky … sometimes
you don’t measure what you think you are measuring).

$ ruby string_symbol_time.rb
Strings Filling
 14.080000   0.090000  14.170000 ( 14.406058)
Strings Fetching
  4.320000   0.030000   4.350000 (  4.355025)

Symbols Filling
 12.300000   0.030000  12.330000 ( 12.561648)
Symbols Fetching
  3.370000   0.030000   3.400000 (  3.461109)

also, don’t forget that symbols are never freed.

True, but when used properly, this is rarely a concern. If they are
used as programmer names for things, then the number of symbols is
finite and not likely to grow and consume memory as the program runs.

If however, you are dynamically creating symbols by interning strings, I
would suggest you review your use of symbols and consider using strings
instead.

–
– Jim W.

#!/usr/bin/env ruby

require ‘benchmark’

SIZE = 100
N = 10000

def make_str_keys
(1…SIZE).collect { |i| “key#{i}” }
end

def make_sym_keys(strs)
strs.collect { |s| s.intern }
end

def populate(keys)
result = {}
keys.each_with_index do |k, i|
result[k] = i
end
result
end

def fetch(keys, hash)
keys.each do |key| hash[key] end
end

strs = make_str_keys
syms = make_sym_keys(strs)

str_hash = populate(strs)
sym_hash = populate(syms)

puts “Strings Filling”
puts Benchmark.measure {
N.times do
populate(strs)
end
}

puts “Strings Fetching”
puts Benchmark.measure {
N.times do
fetch(strs, str_hash)
end
}

puts
puts “Symbols Filling”
puts Benchmark.measure {
N.times do
populate(syms)
end
}

puts “Symbols Fetching”
puts Benchmark.measure {
N.times do
fetch(syms, sym_hash)
end
}

Surgeon · December 28, 2005, 11:21pm

How about devoting the next Ruby Q. to coming up with the
best-of-class examples, self paced-tutorial and documentation to settle
the :symbol vs “string” issue? At some point you have to ask yourself
are the explanations given so far to inquiring users adequate. The fact
that this question keeps coming up must be seen as evidence that there
is something lacking in the explanations previously given. At a minimum
all the explanations give so far should be edited up into a FAQ entry
that the experts can agree upon. Just my two cents.
6.1 What does :var mean? A colon followed by a name generates an
integer(Fixnum) called a symbol which corresponds one to one with the
identifier. “var”.intern gives the same integer as :var, but the ``:‘’
form will create a local symbol if it doesn’t already exist. The
routines “catch”, “throw”, “autoload”, and so on, require a string or a
symbol as an argument. “method_missing”, “method_added” and
“singleton_method_added” (and others) require a symbol. The fact that a
symbol springs into existence the first time it is referenced is
sometimes used to assign unique values to constants: NORTH = :NORTH
SOUTH = :SOUTH
EAST = :EAST
WEST = :WEST

http://www.rubycentral.com/faq/rubyfaqall.html#s6

Surgeon · December 28, 2005, 11:27pm

On Thu, 29 Dec 2005, Dan D. wrote:

NORTH = :NORTH
SOUTH = :SOUTH
EAST = :EAST
WEST = :WEST

%w( NORTH SOUTH EAST WEST ).each{|c| const_set c, c}

-a

Surgeon · December 28, 2005, 11:21pm

James B. [email protected] writes:

Question: Would using a constant be equally suitable for expressing
intention, and (possibly) less error-prone?

I would say that if your language does not provide a means for you to
define your own identifier, then it would be acceptable.

But this seems pointless in ruby.

Assume ConstUtils.next_value

ensures unique values

HOST = ConstUtils.next_value
PORT = ConstUtils.next_value

For discussion simplicity sake, I’d just assume next_value returning
integers.

foo1 = {
HOST => ‘localhost’,
PORT => 80
}

p foo1 ==> {32=>“localhost”, 238=>80}

That makes debugging difficult. Since the value of HOST/PORT is
volatile (it could change depending on how next_value generates the
integers, and also if you, say, insert ‘DOMAIN =
ConstUtils.next_value’ between HOST and PORT assignments),
understanding past debugging outputs(e.g., in a log file) would be
harder as well.

A downside to using symbols as constants is that this will not raise
any exceptions:

foo1 = {
:hots => ‘localhost’,
:prt => 80
}

True, typos are such a hard error to prevent. Fortunately, there are
ways around that.

module ConfigIdentifiers [:host, :port].each{|x| const_set(x.to_s.upcase, x)} end

include ConfigIdentifiers
foo1 = { HOST => ‘localhost’, PORT => 80}

is a solution.

But my experience as someone who has made many typos such as that (and
still is making them) is my test cases usually catch them and those
that manage to elude are easily identified (and corrected) manually.

YS.

Surgeon · December 29, 2005, 12:33am

Johannes F. wrote:
…

My hack was a quicky to flesh out the example. After sending it I
thought about assigning symbols. My main point was that if one mistypes
a symbol name, Ruby doesn’t care. Unit tests should catch this, but
using constants might just help things along because of the immediate
error. And it might more clearly express intent.

James

–

http://www.ruby-doc.org - Ruby Help & Documentation
Ruby Code & Style - Ruby Code & Style: Writers wanted
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools

Surgeon · December 28, 2005, 11:30pm

Jim W. wrote:

also, don’t forget that symbols are never freed.

True, but when used properly, this is rarely a concern. If they are
used as programmer names for things, then the number of symbols is
finite and not likely to grow and consume memory as the program runs.

Here I am replying to my own posting, but I think this point could use
some elaboration.

Why are symbols not garbage collected? Because a symbol represents a
mapping from a string name to a unique object. Anytime in the execution
of the program, if that name is used for a symbol, the original symbol
object must be returned. If the symbol is garbage collected, then a
later reference to the symbol name will return a different object.
That’s generally frowned upon (although I don’t really see the harm. If
the original symbol was GC’ed, nobody cared what the original object was
anyways. But that’s the way it works).

This might be one area where the “Symbol isa immutable string” meme
might be doing some real harm. We if think of symbols as strings, then
we tend to build symbols dynamically like we do strings. This is when
the memory leak" problem of symbols becomes a problem.

Here’s a rule of thumb … if a programmer never sees the symbol name in
the code base, then you probably should be using a string rather than a
symbol.

I’m not sure if this helped, or just muddied the water more.

–
– Jim W.

Surgeon · December 29, 2005, 12:42am

On Dec 28, 2005, at 3:22 PM, Ross B. wrote:

On Wed, 28 Dec 2005 19:47:16 -0000, Steve L.
[email protected] wrote:

The preceding URL tells me unequivically that symbols aren’t
strings, but
really doesn’t tell me too much about what they are, other than what,
names???

An object with a name seems a good way to put it - maybe ‘an object
that is a name, by which it can be referenced anywhere’.

At the new_haven.rb December meeting, I gave a short presentation on
Symbols
versus Strings. I described Symbols as:

Named numbers, you pick the name, Ruby picks the number

The number that Ruby picks is effectively an index into an internal
table
with some bit shifting and masking to encode the index into a 32-bit
Ruby
reference value.

The internal table gets new entries when a symbol literal is parsed
or when String#to_sym is called.

Gary W.

Surgeon · December 29, 2005, 1:25am

On Thu, Dec 29, 2005 at 07:30:44AM +0900, Jim W. wrote:

Jim W. wrote:
Why are symbols not garbage collected? Because a symbol represents a
mapping from a string name to a unique object. Anytime in the execution
of the program, if that name is used for a symbol, the original symbol
object must be returned. If the symbol is garbage collected, then a
later reference to the symbol name will return a different object.
That’s generally frowned upon (although I don’t really see the harm. If
the original symbol was GC’ed, nobody cared what the original object was
anyways. But that’s the way it works).

Keep in mind that symbols are immediate values backed by
the global_symbols table (actually global_symbols.tbl and
global_symbols.rev, for the name => id and id => name associations
respectively). Since the lower bits encode information like ID_LOCAL,
ID_INSTANCE, etc., symbol values cannot point to the appropriate entry
in global_symbols the same way VALUEs for normal objects point to RVALUE
slots in the object heaps. [1]

During the mark phase, the stack must be scanned for references to live
objects. It’s easy to verify if an unsigned (long) long seems to point
to an object, by seeing if the address falls into the area occupied by
some heap and actually corresponds to a slot. In order to mark symbol
entries
in global_symbols, a lookup in global_symbols.rev would be needed for
each
word in the stack. I conjecture that this would be relatively expensive,
but
there are probably better reasons for the current behavior (it’s too
late to
read the sources in detail though :)…

[1] Even if those bits were not used, another level of indirection would
be needed due to the way the hash table works.

Surgeon · December 29, 2005, 2:13am

On Thu, Dec 29, 2005 at 06:51:29AM +0900, [email protected] wrote:

On Thu, 29 Dec 2005, Johannes F. wrote:

That’s where the ‘can be’ part comes in
The point is that symbols support quicker lookup by their nature.
Whether they are quicker in practice will depend on the
implementation. From the timings you give, it looks like symbol lookup
is implemented by converting the symbol to a string and doing string
lookup. Which is obviously not quicker

i never consider that as an impl - i bet your right though… time for me to
read the source.

I think :sym.hash uses the method definition from Kernel (inherited
through
Object) and :sym.eql?(:foo) would also reuse the existing definition.

[see C code at the bottom]

So there’s seemingly no reason for symbol hashing/comparison not to be
faster than for strings. The benchmarks you linked to, as well as Jim’s,
use relatively short strings, but one can exaggerate the effect:

>> Strings Filling

>> 5.800000 0.000000 5.800000 ( 6.302649)

>> Strings Fetching

>> 3.120000 0.010000 3.130000 ( 3.404679)

>>

>> Symbols Filling

>> 2.120000 0.000000 2.120000 ( 2.326393)

>> Symbols Fetching

>> 0.640000 0.000000 0.640000 ( 0.700178)

#!/usr/bin/env ruby

require ‘benchmark’

SIZE = 100
N = 10000

RUBY_VERSION # => “1.8.4”

def make_str_keys
(1…SIZE).collect { |i| “long key” * i}
end

def make_sym_keys(strs)
strs.collect { |s| s.intern }
end

def populate(keys)
result = {}
keys.each_with_index do |k, i|
result[k] = i
end
result
end

def fetch(keys, hash)
keys.each do |key| hash[key] end
end

strs = make_str_keys
syms = make_sym_keys(strs)

str_hash = populate(strs)
sym_hash = populate(syms)

puts “Strings Filling”
puts Benchmark.measure {
N.times do
populate(strs)
end
}

puts “Strings Fetching”
puts Benchmark.measure {
N.times do
fetch(strs, str_hash)
end
}

puts
puts “Symbols Filling”
puts Benchmark.measure {
N.times do
populate(syms)
end
}

puts “Symbols Fetching”
puts Benchmark.measure {
N.times do
fetch(syms, sym_hash)
end
}

rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

…

VALUE
rb_obj_id(VALUE obj)
{
if (SPECIAL_CONST_P(obj)) {
return LONG2NUM((long)obj);
}
return (VALUE)((long)obj|FIXNUM_FLAG);
}

rb_define_method(rb_mKernel, "eql?", rb_obj_equal, 1);

…

static VALUE
rb_obj_equal(VALUE obj1, VALUE obj2)
{
if (obj1 == obj2) return Qtrue;
return Qfalse;
}

Surgeon · December 29, 2005, 2:44am

Yohanes S. wrote:

Alex K. [email protected] writes:

2005/12/28, Surgeon [email protected]:

Hi,

I am a Ruby newbie. I wish I didn’t post such a simple question here
but I had to.
What is the difference between :foo (a keyword) and “foo”(a string).
Can they be used interchangeably? Are they fundamentally same and is
the only difference performance?

http://onestepback.org/index.cgi/Tech/Ruby/SymbolsAreNotImmutableStrings.red

What a coincidence. Seems like Jim and I finally had enough of people
conflating symbols and immutable strings on the same day.

http://microjet.ath.cx/WebWiki/2005.12.27_UsingSymbolsForTheWrongReason.html

While, technically speaking, both of you are absolutely and
undeniably correct (with one correction: Symbols are not Strings),
such a conflation is the best way to get someone over the
initial confusion. As this thread quite well demonstrates,
a definition for Symbols is quite difficult to come up with.

To paraphrase fifteen thousand fourty-three mediocre
comedians over the last three centuries:

“A Symbol is like a word, a sentence, a phrase, a
description or, perhaps, a name. Except sometimes.”

YS.

E

Surgeon · December 29, 2005, 3:28am

On 12/28/05, [email protected] [email protected] wrote:

Along with Jim and Mauricio, my tests indicate that symbols are
consistently quicker, even on short strings.

Here’s my benchmark

def bmark_string_symb
require ‘benchmark’
strings, symbols=[], []
n, m=100, 1000
hash={}
n.times {|x| strings<<strings<<x.to_s+“key”}
strings.each {|s| symbols<<s.to_sym}
# initialize hash
strings.each {|s| hash[s]=1}
symbols.each {|s| hash[s]=1}
Benchmark.bm(10) do |b|
b.report(“string set”) { m.times {|x| strings.each {|s|
hash[s]=x}}}
b.report(“symbol set”) { m.times {|x| symbols.each {|s|
hash[s]=x}}}
b.report(“string get”) { m.times {|x| strings.each {|s| hash[s]}}}
b.report(“symbol get”) { m.times {|x| symbols.each {|s| hash[s]}}}
end
end

and here are some results:

irb(main):080:0> bmark_string_symb
user system total real
string set 0.219000 0.016000 0.235000 ( 0.235000)
symbol set 0.141000 0.000000 0.141000 ( 0.141000)
string get 0.078000 0.000000 0.078000 ( 0.078000)
symbol get 0.047000 0.000000 0.047000 ( 0.047000)
=> true
=> true
irb(main):083:0> bmark_string_symb
user system total real
string set 0.234000 0.000000 0.234000 ( 0.235000)
symbol set 0.063000 0.000000 0.063000 ( 0.062000)
string get 0.078000 0.000000 0.078000 ( 0.078000)
symbol get 0.047000 0.000000 0.047000 ( 0.047000)
=> true

There’s a fair amount of variation, but symbols appear to behave as
expected (quicker on average), meaning that my guess that symbol
lookup in hashes was done on the basis of their string value was
wrong.
I guess I should learn to refrain from speculating until I’ve checked
closer

jf

Surgeon · December 29, 2005, 3:53am

At 11:31 AM +0900 12/29/05, Devin M. wrote:

That means, with strings, if you say ‘foo’ 5 times in the code, you’re
creating 5 string objects, whereas with symbols, you’re only creating one.

I can understand this as a possible implementation, but I
really wonder about the follow-on statement that I’ve seen,
saying that this can cause memory leaks. If I define a
string and then discard it, shouldn’t it be GCed?

-r

Surgeon · December 29, 2005, 3:32am

Jim W. wrote:

I’m not sure if this helped, or just muddied the water more.

When I first was trying to learn about symbols, attempts to explain
their “intentions” (as names of things, for example), rather than to
explain what they are, just muddied the water for me. Sure, give me some
advice on when and when not to use them, but also, tell me what they
are, so I can decide for myself:

Like a string, but:
Object-level equality, so :foo.equal?(:foo) and not
‘foo’.equal?(‘foo’)
That means, with strings, if you say ‘foo’ 5 times in the code, you’re
creating 5 string objects, whereas with symbols, you’re only creating
one.
Share many properties with Fixnum (both being “immediate”)–
immutable, no singleton classes, object equality, not picked up by
ObjectSpace.each_object…
Not garbage collected.
Looks cooler – by using a different syntax, you can give some visual
distinction between your keys and your values, for instance.

(Also, Johannes F.'s explanation was pretty good, IMO.)

Devin

Surgeon · December 29, 2005, 3:56am

BTW: Ruby version 1.8.2, Win XP Pro, Pentium M 2.0 GHz

jf

Surgeon · December 29, 2005, 4:11am

On Thu, 29 Dec 2005, Johannes F. wrote:

programs have shown the opposite to be true.

Along with Jim and Mauricio, my tests indicate that symbols are
consistently quicker, even on short strings.

that way well be true now. however, if you look at my test it goes to
some
lengths to make the test a little more ‘real-world’:

creats a large (2 ** 13) hash
poplate using a semi-random distribution
selects keys for lookup in a semi-random distribution
fork for each test to isolate tests somewhat
disable GC for each test
runs each test 4 times

in anycase, all i’m driving at is that a pretty heavy duty test (not
saying
mine is that test) is required to eliminate the differences data
distribution,
gc, and system load have on the results. in particular i can see how a
linear
distribution might have a huge effect - seeing as symbols are
essentially
numbers and hash more predictably whereas making the jump from ‘9999’ to
‘10000’ is likely to land in quite a different bucket.

it’s nonetheless very interesting to see some tests though.

i use both btw.

-a

Surgeon · December 29, 2005, 4:44am

On Dec 28, 2005, at 4:18 PM, Dan D. wrote:

How about devoting the next Ruby Q. to coming up with the best-of-
class examples, self paced-tutorial and documentation to settle
the :symbol vs “string” issue?

Hmm, smells like work and documentation combined. Two evils in one
quiz.

I suspect that would make for an unpopular topic.

James Edward G. II

Surgeon · December 29, 2005, 5:14am

Devin M. wrote:

Share many properties with Fixnum (both being “immediate”)–
immutable, no singleton classes, object equality, not picked up by
ObjectSpace.each_object…

Not garbage collected.

Looks cooler – by using a different syntax, you can give some
visual distinction between your keys and your values, for instance.

Lacking all those cool String methods like #gsub and #[]

so… nothing like a string.

Yeah, I dunno, the “some named thing” was just a little iffy. The
PickAxe was especially annoying in this respect by trying to imply that
a symbol was the name of a method, variable, or class, specifically.
Maybe I’m just ranting about that.

Sorry for the mess,
Devin

Surgeon · December 29, 2005, 6:22am

ara wrote:

but this slightly modified version shows strings being a tiny bit
faster:

The difference is that your version measures more than just hash access
speed. It also includes string and symbol creation times. In
particular, to create a symbol, you must first create a string, so you
have twice as many object creations when using symbols.

–
– Jim W.

Surgeon · December 29, 2005, 4:35am

On Thu, 29 Dec 2005, Johannes F. wrote:

BTW: Ruby version 1.8.2, Win XP Pro, Pentium M 2.0 GHz

your test did show symbols being faster on my (linux - 2g cpu, 2g ram)
machine
too btw…

but this slightly modified version shows strings being a tiny bit
faster:

harp:~ > cat a.rb
require ‘benchmark’

n = 2 ** 16
string_hash, symbol_hash = {}, {}

Benchmark.bm(10) do |b|
b.report(“string set”){ n.times{|x| string_hash[rand.to_s.freeze] =
rand}}
end
Benchmark.bm(10) do |b|
b.report(“symbol set”){ n.times{|x| symbol_hash[rand.to_s.intern] =
rand}}
end

string_keys = string_hash.keys.sort_by{ rand }
symbol_keys = symbol_hash.keys.sort_by{ rand }

Benchmark.bm(10) do |b|
b.report(“string get”){ string_keys.each{|k| string_hash[k]}}
end
Benchmark.bm(10) do |b|
b.report(“symbol get”){ symbol_keys.each{|k| symbol_hash[k]}}
end

harp:~ > ./build/ruby-1.8.4/ruby ./a.rb
user system total real
string set 0.470000 0.000000 0.470000 ( 0.471459)
user system total real
symbol set 0.640000 0.020000 0.660000 ( 0.661556)
user system total real
string get 0.100000 0.000000 0.100000 ( 0.101498)
user system total real
symbol get 0.080000 0.000000 0.080000 ( 0.077205)

i think all we are showing here is that there aren’t good reasons for
one over
the other. but that’s good to i supose - since people certainly seem to
have
preferences.

cheers.

-a