On Symbols

On Wed, 04 Jan 2006 22:06:00 -0000, Dave H.
[email protected]
wrote:

However, although I don’t meant to pick on szpak specifically, he did
provide a sterling example of what doesn’t work, at least for me, and I
have worked professionally as a writer of documentation…

"A symbol in Ruby is similar to a symbol in Lisp

I don’t know Lisp. Or Java. Unless you know that your reader is already
familiar with your reference, it’s not helpful to use it.

Maybe (maybe?) there aren’t any really good references out there on what
a
Ruby symbol is. And maybe you don’t know Lisp. Maybe I don’t either. But
at least we both have a much wider field to Google now…

Hi –

On Thu, 5 Jan 2006, Dave H. wrote:

much less immediate than I’d expect.
I’ve always conceptualized it as pretty much a value vs. pointer
difference. I avoid the word “pointer” because (a) Matz never calls
it that, and (b) it’s not a matter of physical memory, but a Ruby-side
notion of reference. Still, it’s close to the same idea.

Fixnums, Symbols, true, false, and nil get assigned directly to
variables. For other objects, variables get a reference to the
object. References, like variables, are not themselves objects.
They’re part of a kind of language substratum on which the object
system floats.

I think that’s all there is to it. In fact, in a sense there’s even
less to it, since Ruby handles any necessary de-referencing for you,
so you don’t have to make any explicit distinction in your code.

Once again, a gentle reminder: please feel free to comment on the learning
process I’m documenting if you like, but do NOT respond if you’re going to
try to explain more about symbols and whatnot. Thanks!

I’m giving myself a directly-quoted-person exemption :slight_smile:

David


David A. Black
[email protected]

“Ruby for Rails”, from Manning Publications, coming April 2006!

Quoting [email protected]:

Fixnums, Symbols, true, false, and nil get assigned directly to
variables.

It may be helpful to think of them as objects which are addressed
directly by their values.

-mental

DAB wrote:

Fixnums, Symbols, true, false, and nil get assigned directly to
variables. For other objects, variables get a reference to the
object. References, like variables, are not themselves objects.
They’re part of a kind of language substratum on which the object
system floats.

True, but …

[…] In fact, in a sense there’s even
less to it, since Ruby handles any necessary de-referencing for you,
so you don’t have to make any explicit distinction in your code.

I think that this second statement is much more important than the
first. The fact that 1073741823 is direct and 1073741824 is a reference
makes very little difference 99.99% of the time.

Understanding that Fixnums, Symbols and the such are direct is good for
groking implementation details, but provides little insight in using
Ruby. The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

Just MHO.


– Jim W.

Hi –

On Thu, 5 Jan 2006, Jim W. wrote:

less to it, since Ruby handles any necessary de-referencing for you,
so you don’t have to make any explicit distinction in your code.

I think that this second statement is much more important than the
first. The fact that 1073741823 is direct and 1073741824 is a reference
makes very little difference 99.99% of the time.

Understanding that Fixnums, Symbols and the such are direct is good for
groking implementation details, but provides little insight in using
Ruby. The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

It is indeed. I think one reason I’m so immediate-value-aware is the
history of discussions of ++, the absence of which makes sense in
light of the immediate value thing (with numbers). And I guess I’d
put that at a middle level – not something you really need to know to
use Ruby, but not quite an implementation detail either; more of a
language design thing.

David


David A. Black
[email protected]

“Ruby for Rails”, from Manning Publications, coming April 2006!

On Thu, Jan 05, 2006 at 07:06:00AM +0900, Dave H. wrote:

However, although I don’t meant to pick on szpak specifically, he did
provide a sterling example of what doesn’t work, at least for me, and I
have worked professionally as a writer of documentation…

"A symbol in Ruby is similar to a symbol in Lisp

I don’t know Lisp. Or Java. Unless you know that your reader is already
familiar with your reference, it’s not helpful to use it.

To be fair, I think the only reason he discussed the comparison of Ruby
symbols to Lisp symbols is this (I use “he” in the generic here):

I wasn’t getting a clear picture of what a Ruby symbol actually was from
most of the comments about it, until I saw some evidence that it bore an
at least passing resemblance to a Lisp symbol. While I’m no leet lisp
haxxor type, I do have a halfway decent notion of what a symbol is in
Lisp, so I pursued that comparison as a means of helping me understand
what a Ruby symbol was, and szpak was responding to that. In other
words that was kinda directed at me, and it did help me a bit to
understand Ruby symbols, so as a special case explanation it did exactly
what it needed to.

That having been said, if you’d like to try approaching the problem of
uderstanding a Ruby symbol by way of learning what a Lisp symbol is and
differentiating it from a Ruby symbol, let me know. I’ll see if I can
accomodate you. As far as my understanding of the subject goes, that
seems to be the shortest path from zero to grokking.


Chad P. [ CCD CopyWrite | http://ccd.apotheon.org ]

print substr(“Just another Perl hacker”, 0, -2);

The longish ‘foo’ vs :foo thread was helpful to me as a “nuby” in terms
of understanding the ruby Symbols. Personally, I like to see the dirty
details of what happens when a symbol is encountered in teh token
stream:

For example, for me it would be instructional to hear from a Ruby expert
as to what the interpreter does when it sees :foo or some such.

I don’t think the discussions of lower level details are all bad. Most
of us maybe new to Ruby, but most if not all, have quite a few years in
computing, so there might still be instructional value in giving the
details on what exactly happens when a ruby ‘Symbol’ is encountered in
the token stream.

It might benefit those who wish to look a little deeper.

thanks to all the experts for taking the time to go over this for the
benefit of us newcomers. (and then going the extra mile to ask whether
it made sense… now that’s truly refreshing to see in an online
community) :slight_smile:

Cheers,

-A

Devin M. wrote:

Hey, all you lurkers:

Have any of the explanations in the thread (What is the difference
between :foo and “foo” ?) helped you understand symbols? A combination
of them? Or a combination of a couple of explanations, and an irb
session? Which ones? Why? If you don’t want to get involved in (what
seems to be turning into) a flamewar, email me personally (hint: don’t
click “Reply” :P), and I’ll compile the results anonymously.

I think we all agree that everybody learns differently, and so I think
we’re in dire need of feedback from somebody other than Chad (no
offense).

Devin

On Jan 4, 2006, at 7:49 PM, Jim W. wrote:

The fact we can pretend that everything is a reference is a
marvelous feature of the Ruby object model.

+1

In fact I think of Fixnum and Symbol literals as
just special variables pre-bound to references to particular
objects in the way that uninitialized instance variables
are pre-bound to the nil object, but unlike variables you
can’t vary the binding. I’d call them constants but then
everyone would get confused with that whiny group of variables
that start with an uppercase letter and then complain every-
time something changes.

Gary W.

[email protected] wrote:

Thats the end of days leason! Hope it helps!

Evan Webb // [email protected]

Nicely done, Evan! Very good explanation. Thanks!

Here is a short breakdown of how symbols (and other immediates are
implemented):

A variable holds a value. That value is a integer. The value of the
integer determines what it means. For example:

If the integer is odd, then the remaining bits of the integer are a
Fixnum value.

This means that if you do

a = 0

the interpreter stores in the local variable table the value
0x00000001. If you had assigned 4 to a, then the value would be
0x00000041. This allows for all Fixnums to not require additional
memory to represent. The same goes for true, false, nil, and symbols.
For the first 3, they are:

Name Backend Integer Value

false 0
true 2
nil 4
undef 6 (This isnt accessible from native ruby code, but is used
internally)

For symbols, the least significant byte is 0x0e and the upper 3 bytes
are a integer. The integer is uniquely assigned value for a string.
Think about it as the table index for a string. By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing “evan” is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, “evan” is looked up
in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it’s odd, it’s a Fixnum immediate value. If
it’s 0,2,4, or 6, it’s a “core” immediate value. If it has the LSB is
0x0e, it’s a symbol. Otherwise, it’s a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Evan Webb // [email protected]

[email protected] wrote:

Here is a short breakdown of how symbols (and other immediates are
implemented):

By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing “evan” is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, “evan” is looked up
in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it’s odd, it’s a Fixnum immediate value. If
it’s 0,2,4, or 6, it’s a “core” immediate value. If it has the LSB is
0x0e, it’s a symbol. Otherwise, it’s a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Evan Webb // [email protected]

Thanks Evan, much appreciated!

This is how I understand it now:

  1. When the interpreter sees : it looks in the “symbol table”

  2. if it finds the value, it returns the int index (or the computed
    object_id?) of it otherwise creates a new entry

  3. somestringofchars.object_id returns something which is a function (in
    mathematical sense) of the index of somestringofchars in the symbol
    table. (i.e., indexofsymbolstring << 8 | 0xE0 )

  4. ‘:’ is just a way of giving the interpreter the heads up that a
    symbol is coming up in the token stream (ie we think we know what we’re
    talking about, can you please look it up in the symbol table? nice
    interpreter… nice interpreter)

  5. some object_id values are computed differently, for example the
    session below (I don’t know why, its a hole in my understanding of how
    object_id’s are assigned):

irb(main):054:0> false.object_id
=> 0
irb(main):055:0> 0.object_id
=> 1
irb(main):056:0> true.object_id
=> 2
irb(main):057:0> 1.object_id
=> 3
irb(main):058:0> nil.object_id
=> 4
irb(main):059:0> def.object_id # not sure why
irb(main):060:1> undef.object_id # possibly cuz its strictly
internal

  1. Whether a token is valid or not, it gets added to the symbol table
    and an object_id can be computed from the symbol based on what type
    of symbol it is (only if its a valid object methinks). Otherwise an
    error is thrown.

  2. there is a separate table which holds the variables. I’m not sure if
    this is true from what I’ve seen in irb, it looks like a variable’s
    symbol gets stored in the symbol table as well or at least a symbol
    which may point to its value location.

:8.1 Every time we refer to a variable var , the interpreter uses the
:var thingy
so if we did xx=“Hi there”, there would be an :xx created, but I don’t
know how to get to “Hi there” from :xx (:xx.to_s just gives me “xx”)

  1. a symbol is just an atomic representation (AFA-the user-IC) of a
    token added to the symbol table and exposed via the Symbol class so we
    can use it if we want to instead of creating new string objects for
    referring to things like methods etc and incurring needless overhead
    (however small it might be).

  2. I’m guessing Ruby interpreter needs symbols for its own housekeeping
    (obviously) but the implementers were just being nice and allowed end
    users to use them too for certain specific situations (I can’t think of
    a good example ).

So, basically, the first thing the interpreter does is, it takes the
token and stuffs it in the symbol table, then it figures out what to do
with it (steps 1…n) . And since we have access to the symbol for a
given reference, why not use that instead of referring to it via a
string object which gets created anew every time we referr to it. Even
though the end result is the same.

thanks,

-A

If I may make one correction/clarification to an otherwise excellent
explanation of the implementation…

On 1/5/06, [email protected] [email protected] wrote:

For symbols, the least significant byte is 0x0e and the upper 3 bytes
are a integer. The integer is uniquely assigned value for a string.
Think about it as the table index for a string. By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing “evan” is created and stored

Clarification: a C string containing “evan” is created…

a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

Correction: a.to_s returns a reference to a new String object
containing the same sequence of characters as the C string in the
symbol table. This is visible when you compare the result of
#object_id on subsequent calls to a.to_s:

irb> a = :evan # => :evan
irb> a.to_s.object_id # => 1657424
irb> a.to_s.object_id # => 1653204

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it’s odd, it’s a Fixnum immediate value. If
it’s 0,2,4, or 6, it’s a “core” immediate value. If it has the LSB is
0x0e, it’s a symbol. Otherwise, it’s a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Thanks, Evan! Aside from those minor, pedantic corrections, it was
indeed an excellent lesson.

Jacob F.

Jacob,

Correct. The symbol table holds pointers to C strings and new String
objects are created with each call to Symbol#to_s.

I should note that Symbols are native ruby access to the ruby runtime
ID type. It’s this reason that C strings are stored in the symbol
table, because the C functions for using ID’s (rb_intern and
rb_id2name) use/return char * and ID.

Just because there seems to be so much confusion i would like to
point out some minor flaws in your post so nobody else stumble over
them. (Correct me please if the error is on my side)

[email protected] wrote:

a = 0

the interpreter stores in the local variable table the value
0x00000001. If you had assigned 4 to a, then the value would be
0x00000041. This allows for all Fixnums to not require additional

no, i think it would store 0x00000009 because only the first bit
is reserved, not the first nibble. ( 4 << 1 | 1)

For symbols, the least significant byte is 0x0e and the upper 3 bytes
are a integer. The integer is uniquely assigned value for a string.
Think about it as the table index for a string. By using a symbol, you
basically allocate a string once and then refer to it by the index it
occupies in a special symbol table. For example, the first time the
symbol :evan is seen, a string containing “evan” is created and stored
in the symbol table at, say, index 9323. The variable that was assigned
:evan gets assigned
((9223 << 8 ) | 0x0e). The next time :evan is seen, “evan” is looked up

This should obviusly be ((9232 << 8 ) | 0x0e)

in the symbol table to obtain 9232 again.

So, to review:

a = :evan

a.to_i # => 9232 (the index in the symbol table)
a.object_id # => 2363406 (the index << 8 | 0x0e)
a.to_s # => the reference to the string object located at
index 9232 in the symbol table

This seems to create a copy each time. (at least if there is no ruby
string around)

The ruby runtime rules take the integer value and apply the rules to
determine what it means. If it’s odd, it’s a Fixnum immediate value. If
it’s 0,2,4, or 6, it’s a “core” immediate value. If it has the LSB is
0x0e, it’s a symbol. Otherwise, it’s a pointer to a memory address that
holds the information about the object.

Thats the end of days leason! Hope it helps!

Evan Webb // [email protected]

Thanks this may be realy helpfull for those who want to understand
symbols and have a decent idea how interpreters work.

cheers

Simon