Why was the "Symbol is a String"-idea dropped?

Hello everybody,

Although not a lot from the Ruby-Core specialists,
but still I have learned a lot from the discussion.
I am trying build a conceptual picture now.

Some say Strings and Symbols are conceptuelly very different
some say they are quite close.

I view it like this:
Symbols essentially are names, Strings essentially are data,
while they both appear as sequences of characters.

Names/Symbols are just atomic, constant, unrelated entities,
while Strings as data have a rich life, they can be related in
many ways they can be analysed, even be modified.

That’s a clean distinction and I think it is very well-represented
in the current Ruby implementation.

It this light, it seems nonsensical to make one the subclass of the
other.
(A common superclass would be OK, though.)

Now, in practice, the situation gets more complex:

  1. Names sometimes turn into data (option names, method names, table
    names…),
    especially when things get highly dynamic.
  2. Sometimes, programmers to use the conceptually “wrong” class, maybe
    as a kind of optimization, for the sake of beauty or out of lazyness
    :slight_smile:

One could argue that it is good that Symbol and String are
well-separated,
because it educates programmers to decide for the “correct” class to
use.

On the other hand, the following situation occurs very very often:
You need to transfer a sequence of characters – which format do you
use
always Symbols, always Strings, should it allow both? (Or even a fancy
object)

First, you could argue that when you use duck-typing, the interface can
be kept open.
But still, many situations remain, where this question is remains.

This choice can be a burden, especially if you think of
inter-operability or optimisation.

And that is an argument for some sort of unification of Symbol and
String.

Subclassing alone would not be enough, to solve the problem above,
also, String#== and Symbol#== would have to be defined such that “a” ==
:a
And also #hash would have to be defined accordingly.

Then you would still have the two different kinds of objects (“a” and
:a)
but they would behave quite the same except for modifying methods.

Now, as I am writing this, I doubt that the advantages
of the unification are really worth doing it…

It depends on factors not known to me.

But now, I think I can understand the core-team’s decision better.

Bye
Sven

Brian C. schrieb:

of ‘singleton’ was of a class with only a single instance, where the

If it were Symbol.new(“foo”) always returning the same object then I guess
it would probably be called the multiton pattern.

Isn’t the term “immediate value” used for that? Like:
:abc is an immediate value, and so is 12, so is nil
“abc” is a reference value und so is [1, 2] and also {} and even 12.0

On May 16, 5:44 am, “Sven S. (enduro)” [email protected] wrote:

Subclassing alone would not be enough, to solve the problem above,
also, String#== and Symbol#== would have to be defined such that “a” == :a
And also #hash would have to be defined accordingly.

Then you would still have the two different kinds of objects (“a” and :a)
but they would behave quite the same except for modifying methods.

While I think Symbol probably could use at least few of String’s
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == “a” ?

Now, as I am writing this, I doubt that the advantages
of the unification are really worth doing it…

It depends on factors not known to me.

But now, I think I can understand the core-team’s decision better.

Thanks for this excellent summary.

T.

On 5/16/07, Trans [email protected] wrote:

While I think Symbol probably could use at least few of String’s
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == “a” ?

Well there is precendent, 2 == 2.0 and so on
On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === ‘a’ but not :a == ‘a’

On 5/16/07, [email protected] [email protected] wrote:

:a

With symbols being as integer-like as they are string-like, though,
it’s really equally similar to:

2 == :“2”

I don’t think symbols are integer like. (I don’t know that they are
especially string like either), but I’d be willing to bet a lot more
code in the wild would be broken if you removed Symbol#to_s vs.
removing Symbol#to_i.

Your example really ought to be

2 == :whatever_symbol_whose_to_i_results_in_2

Hi –

On Wed, 16 May 2007, Logan C. wrote:

but they would behave quite the same except for modifying methods.

While I think Symbol probably could use at least few of String’s
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == “a” ?

Well there is precendent, 2 == 2.0 and so on

With symbols being as integer-like as they are string-like, though,
it’s really equally similar to:

2 == :“2”

On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === ‘a’ but not :a == ‘a’

I guess as long as :a === :a was still true, that might be a good way
to express the fact that “this is the string of which this symbol is a
case”, or something like that.

David

On May 16, 2007, at 11:17 AM, Logan C. wrote:

On 5/16/07, [email protected] [email protected] wrote:

With symbols being as integer-like as they are string-like, though,
it’s really equally similar to:

2 == :“2”

I don’t think symbols are integer like.

This is the ‘equivalence is defined by identity’ idea again. I think
this is what David means by ‘integer-like’. It is this property that
both fixnums and symbols share but that is not shared by strings.

Making ‘==’ work with mixed operands of symbols and strings breaks that
idea and leads to the strange example that David gave (2 == :“2”).

Gary W.

Hi –

On Fri, 18 May 2007, Gary W. wrote:

this is what David means by ‘integer-like’. It is this property that
both fixnums and symbols share but that is not shared by strings.

Yes, it’s the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

David

On Fri, May 18, 2007 at 03:17:01AM +0900, [email protected] wrote:

Yes, it’s the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

Frozen strings are immutable.

Paul

On 5/18/07, Paul B. [email protected] wrote:

On Fri, May 18, 2007 at 03:17:01AM +0900, [email protected] wrote:

Yes, it’s the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

Frozen strings are immutable.

But not immediate.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On 5/18/07, Rick DeNatale [email protected] wrote:

On 5/18/07, Paul B. [email protected] wrote:

On Fri, May 18, 2007 at 03:17:01AM +0900, [email protected] wrote:

Yes, it’s the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.

Frozen strings are immutable.

But not immediate.

What about

%f{This is sooo cooooold} << “!”

TypeError: can’t modify frozen string
Just an idea.

Robert

On 5/18/07, Rick DeNatale [email protected] wrote:

a = “abc”.freeze
b = “abc”.freeze
c = :abc
d = :abc
a.object_id => -606341628
b.object_id => -606347008
c.object_id => 343218
d.object_id => 343218

The key difference is that there’s only one instance of a symbol with
a given string representation.
Ah I see, I got confused, I did not understand the meaning of
immediate immediately ;).
Although theoretically the interpreter could create an immediate value
for
%f{…} we would probably run out of address space :frowning:

Cheers
Robert

On 5/18/07, Robert D. [email protected] wrote:

What about

%f{This is sooo cooooold} << “!”

TypeError: can’t modify frozen string
Just an idea.

That’s the immutable part, but

a = “abc”.freeze
b = “abc”.freeze
c = :abc
d = :abc
a.object_id => -606341628
b.object_id => -606347008
c.object_id => 343218
d.object_id => 343218

The key difference is that there’s only one instance of a symbol with
a given string representation.

The shorthand way of saying this is that symbols, like fixnums are
immediate. Which is a sufficent but not necessary condition, it
crosses the line a bit in describing both the identity relationship
requirement AND the implementation.

Most normal objects are referenced at the C level by an internal value
which is a pointer to the objects state representation in memory.
Since objects are aligned at least on a word boudary, all normal
object pointers will have the 2 least significant bits as zero. They
will also be non-zero

A few objects are immediate which means that they are referenced at
the C level by a representation whose value is not a pointer. Fixnums
are represented by shifting the C representation left one bit and
turning on the low-order bit. False is represented by 0, True by 2,
and Nil by 4.

Ruby symbols are represented by a value computed by shifting the
symbols integer representation left 8 bits and setting the low-order
byte to 0xFF representation

As I said, it’s not essential that symbols be immediate, for example
interning a string could create a Symbol instance which was frozen and
registered in a global symbol table, i.e. the multiton pattern, but
the current implementation no doubt has some advantages in either
low-level mechanism performance, supporting some niche in ruby legacy,
or both.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

and Nil by 4.

Ruby symbols are represented by a value computed by shifting the
symbols integer representation left 8 bits and setting the low-order
byte to 0xFF representation

Perhaps it varies based on the Ruby version you’re running; it’s not
like
that for me.

irb(main):006:0> :foo.object_id.to_s(16)
=> “39490e”
irb(main):007:0> RUBY_VERSION
=> “1.8.4”

I think a weaker requirement than ‘immediate’ is needed. A symbol can
quite
happily be a regular object; we just need to ensure that there is always
only one symbol for a particular symbol character sequence.

Regards,

Brian.

On 5/19/07, Brian C. [email protected] wrote:

On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

=> “1.8.4”
You can’t really see the internal bit representations from ruby, since
they get manipulated before you see them. Much like the class of an
object reported by ruby isn’t the same as the object pointed to by its
klass pointer at the C level.

And even if you could, I was talking about the integer representation
of the symbol, not the object_id.

Not to say that this doesn’t change between versions of ruby. Which
is why it’s carefully hidden from ruby code.

I think a weaker requirement than ‘immediate’ is needed. A symbol can quite
happily be a regular object; we just need to ensure that there is always
only one symbol for a particular symbol character sequence.

Yes, I said that, but the key issue for the subject of the current
thread is that Symbols aren’t strings, they might have both a string
representation and an integer representation, but then so do integers,
and unlike Strings they have an essential requirement that equality
implies identity which is an accidental property of integers in the
range of Fixnum.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On Sun, May 20, 2007 at 10:18:22AM +0900, Rick DeNatale wrote:

irb(main):006:0> :foo.object_id.to_s(16)
of the symbol, not the object_id.
AFAIK, the object_id is the in-memory pointer to the structure of the
object
(if it’s a material object), or is one of the special values:

  • 0, 2 or 4 for false, true or nil

  • (n<<1) | 1 for Fixnums

None of these is valid as a pointer to a memory location, so they can be
recognised immediately as special.

So in the above, :foo’s object ID looks like a memory pointer to me. It
might not be, but then you’d need to guarantee that 39490e could not
possibly be a valid memory pointer for some regular object (and also be
able
to recognise this by inspection, i.e. by looking at the bit pattern)

Regards,

Brian.

On 5/16/07, Logan C. [email protected] wrote:

While I think Symbol probably could use at least few of String’s
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == “a” ?

Well there is precendent, 2 == 2.0 and so on
On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === ‘a’ but not :a == ‘a’

Honestly I prefer to write

case s.to_s
when ‘a’

instead of
case s
when ‘a’

but the most explicit way to do this is maybe the most readable

case s
when :a, ‘a’

Cheers
Robert

P.S.
Tom is right that was an excellent
resumé.R

On 5/20/07, Brian C. [email protected] wrote:

And even if you could, I was talking about the integer representation
of the symbol, not the object_id.

AFAIK, the object_id is the in-memory pointer to the structure of the object
(if it’s a material object), or is one of the special values:

  • 0, 2 or 4 for false, true or nil

  • (n<<1) | 1 for Fixnums

Not starting with 1.8.5
VALUE
rb_obj_id(VALUE obj)
{
/*
* 32-bit VALUE space
* MSB ------------------------ LSB
* false 00000000000000000000000000000000
* true 00000000000000000000000000000010
* nil 00000000000000000000000000000100
* undef 00000000000000000000000000000110
* symbol ssssssssssssssssssssssss00001110
* object oooooooooooooooooooooooooooooo00 = 0 (mod
sizeof(RVALUE))
* fixnum fffffffffffffffffffffffffffffff1
*
* object_id space
* LSB
* false 00000000000000000000000000000000
* true 00000000000000000000000000000010
* nil 00000000000000000000000000000100
* undef 00000000000000000000000000000110
* symbol 000SSSSSSSSSSSSSSSSSSSSSSSSSSS0 S…S % A = 4
(S…S = s…s * A + 4)
* object oooooooooooooooooooooooooooooo0 o…o % A = 0
* fixnum fffffffffffffffffffffffffffffff1 bignum if
required
*
* where A = sizeof(RVALUE)/4
*
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
*/
if (TYPE(obj) == T_SYMBOL) {
return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
}
if (SPECIAL_CONST_P(obj)) {
return LONG2NUM((long)obj);
}
return (VALUE)((long)obj|FIXNUM_FLAG);
}

1.8.6 and 1.9 have the same code.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/