On 07/25/2011 02:20 AM, Kaye Ng wrote:
I’m still a bit confused on whether to use strings or symbols in certain
situations. They produce the same results, although strings take up
more memory, which I’m not sure how or why exactly.
The difference is that every string literal in your code, whether it is
the exact same sequence of bytes or not, is a separate object and thus
has its own piece of your program’s available memory. Every symbol with
the same name is the exact same object.
current_situation = “good”
puts “Everything is fine” if current_situation == “good”
puts “PANIC!” if current_situation == “bad”
The check for equality you performed here is not the same as a check for
object identity. Most of the time when you compare strings for equality
you don’t care whether they are the same object or not. You just want
to know if they have the same contents, so the == method for the String
class is defined to check for that.
“good” and “good” and “bad” (3 strings) take up memory. Why is that?
Isn’t the second “good” same as the first?
No, the second “good” not quite the same as this code will demonstrate:
string = “good”
puts “The id of #{string.inspect} is #{string.object_id}”
string = “good”
puts “The id of #{string.inspect} is #{string.object_id}”
puts
symbol = :good
puts “The id of #{symbol.inspect} is #{symbol.object_id}”
symbol = :good
puts “The id of #{symbol.inspect} is #{symbol.object_id}”
When you run this code with your Ruby of choice, you’ll see something
like the following:
The id of “good” is 3390460
The id of “good” is 3390340
The id of :good is 247688
The id of :good is 247688
The numbers will almost certainly be different for you, but take note
that the id of the string “good” changes while that of :good does not.
This means that even though the string literal “good” appears to be
identical as far as the code you write is concerned, the interpreter
sees each instance as an individual.
Now, the actual problem is that every individual object gets a place in
the memory of your program, so you’re wasting memory and possibly
processing time in the garbage collector if you have string literals in
your program that you really do want to treat as the exact same
object. Whether that matters to you is up to you and your needs as the
author.
When do you normally use a symbol instead of a string?
I would suggest that, broadly, you should avoid string literals unless
the data they would contain might be modified, leave your program and
enter another one, or be displayed directly to the user. Otherwise,
you’re probably better off using symbols in order to save work for the
garbage collector. Keep in mind, however, that symbols are never
garbage collected, so creating many of them dynamically (perhaps by
converting user input into symbols, for instance) would be detrimental
and probably a good use case for strings instead.
In any case, the difference in memory consumption and garbage collector
performance is probably negligible for most small scripts and
short-lived processes, so don’t get too hung up on these things. Just
keep the possibility of the difference being meaningful in mind if you
ever have a need to optimize a script, program, or library.
-Jeremy