Kev J. wrote:
You must realize that it’s just the symbol :ruby that shares the same
object_id, not the string value (which must of course be different)
With symbols, the interpreter creates one memory address …
Actually I’ve always had trouble finding a clear answer to the Symbol
question too. It’s never made clear enough, not even in that admittedly
fairly clear explanatation linked to above.
I’m still not sure I know the true answer to “what is a symbol and how
is it different from a string”, but I’m getting closer. And I think part
of the problem is that the symbols are used for different things.
- Symbols as strings:
Firstly, as described in the rest of this thread, they’re used where a
constant string might be used, in the interests of saving memory and
time.
This is often equated with “interning” strings in Java, whereby the
String class maintains a static pool of string objects that can be
reused when a given string value is commonly used.
The advantage - in both Ruby symbols and Java interned strings, is that
if you create 1000 of these things holding the same value, you only
create one object.
So in Ruby,
a = “foo”
b = “foo”
creates 2 string objects. Somewhere in memory are 2 different copies of
“f”, “o” and “o”.
And a and b are really pointers to these two different memory locations.
Whereas
a = :foo
b = :foo
allocates a string - well actually a Symbol - object only once.
There’s only one location with the “f” followed by the two "o"s, and a
and b both point to that same location.
(As a side note, it happens that in Java a = “foo”; b = “foo” version
would actually share the same object like the symbol version in Ruby.
That’s because in this contrived example we’re using string literals and
Java always interns literal Strings)
Now, if you’re really using strings, that is, if you are truly
manipulating strings - say, scraping web pages for information, adding
first and last names together, inserting punctuation, encoding html -
then real strings are what you want. You want a and b to be different
because you want to fiddle with a without messing with b.
But maybe you’re just using strings as a sort of constant, as kind of
“marker” value. Say you have a method that takes an argument indicating
whether an mpeg movie should stop, play or pause.
controlMovie("stop")
Now you’re going to read that argument “stop” in your controlMovie
definition and stop playing. But you weren’t really using “stop” as a
string. You’re not going to modify it; you’re not going to find the
index of “to” in it; you don’t care how long it is. You’re just using as
a marker. In some langauges (including Ruby) you might use a constant,
in some you might use an enum.
There’s nothing really wrong with using a string - it just wastes
resources. Every time you call controlMovie(“play”) a new 4-character
string is allocated in memory. And every time you compare it in the
definition of controlMovie with, say
if (s == "pause")
then you are wasting time comparing the strings character by character.
In this case, we tend to use a symbol.
controlMovie(:stop)
def controlMovie
if (s == :stop) # compares the pointers - i.e. compares two integers -
very fast
For three reasons I would think (and I could be wrong here - I’m
somewhat of a ‘rewbie’ - a ruby newbie) we use symbols:
a) It saves memory by not allocating the same string twice - as
discussed above
b) It saves time by not comparing character by character
c) It’s become a Ruby idiom. That is, when you see the symbol, you know
right away that this is a sort of constant/enum/marker value.
I think c) is important - often symbols are used when the cost of using
a string instead would be neglible. But the symbol is just clearer.
Idiom is important in Ruby.
- Symbols as names:
The second use of symbols is the most overlooked in these discussions:
symbols as names.
Most (all?) computer languages use symbols, and maintain a Symbol Table
to hold them.
Symbols are names. Names of variables, names of methods, names of
classes. In the examples I’ve displayed above, it was clear that :stop
and :foo were symbols. But actually, so were a, b and controlMovie.
(Well, sort of. Really a, b and controlMovie are names that cause
Symbols to be created and stored on the Symbol table. So given there is
a method called controlMovie, there is automatically a symbol
:controlMovie)
The Symbol Table holds a single Symbol for every name used in the
program. Even if the name is used for different things - local variables
in different methods say - there’s only one Symbol for it in the global
table. (There’ll be another table for each method mapping that symbol to
an memory address - but we’re not talking about mapping here, we’re just
talking about the holding of the Symbols themselves)
Compiled languages like C only use the Symbol Table at compile time and
discard it unless compiling with a debug flag. (Debuggers need to show
you the value of a variable given its name - so they need to use the
Symbol Table).
Ruby’s interpreted, so it keeps its Symbol Table handy at all times. You
can find out what’s on it at any given moment by calling
Symbol.all_symbols()
Note that this doesn’t mean you can find the value. When you do a =
“foo”, the symbol :a gets created - but you can’t use that as the
“address of a” as you might use &a in the C language. It’s just symbol.
And the global Symbol Table is really just a list of all the symbols
used in this execution of a Ruby program.
Every time Ruby sees a Symbol being used for the first time, a Symbol is
created and put on the table. The second time it sees that symbol it
finds it on the table.
This use of symbols can be important to the programmer. Those methods
that introspect Ruby generally deal with symbols. So, for example, the
Module method public_method_defined?(symbol) expects a symbol to be
passed in for the method name.
The Java equivalent Reflection method would take a String.
- The confusion:
Here’s why I think people get confused about Symbols. And this is
probably only one of the possible reasons. I’m totally guessing here -
but I tend to find that software often works a certain way for
historical rather than logical reasons, and that the historical code -
it’s history being generally hidden to the user - tends to be the most
confusing. I think that symbols have a history, and I’m guessing
(totally guessing) it works like this:
i) Because Ruby is interpreted language, it has its Symbol Table around
at all times.
ii) Because Ruby has good natural introspection, and because its Symbol
Table is “around”, it makes sense to use Symbols rather than strings for
describing those program elements that are really symbols anyway. It
makes sense to let programmers use Symbols.
iii) Because these symbols are easy to read like strings, yet are unique
throughout the execution of the Ruby program (because true symbols are)
- it make sense to use them as “consants” or marker strings.
In other words, it’s the two different uses of symbols, and the fact
that one probably arose from the other, that makes them confusing.
Correct me if I’m wrong.