Forum: Ruby Why was the "Symbol is a String"-idea dropped?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
enduro (Guest)
on 2007-05-12 11:21
(Received via mailing list)
Hello,

I was exited when I heard that
Symbol was made a subclass of String
in Ruby 1.9, September last year.

But then I heard that the experiment
was stopped after only two months.

And recently I have started to think about this
topic again and I've tried to collect the reasons
why the idea was not pursued any longer.

I have not been very lucky searching the net
for that, that's why I am asking you:

Could someone give me a summary of the reasons
why the approach to make Symbol a subclass of String
is not considered for future Ruby versions anymore?
Or point me towards some information explaining that?

Thank you very much

Sven
Brian C. (Guest)
on 2007-05-12 17:28
(Received via mailing list)
On Sat, May 12, 2007 at 04:20:10PM +0900, enduro wrote:
>
> I have not been very lucky searching the net
> for that, that's why I am asking you:
>
> Could someone give me a summary of the reasons
> why the approach to make Symbol a subclass of String
> is not considered for future Ruby versions anymore?
> Or point me towards some information explaining that?

The two objects have very different behaviours, so why should one be a
subclass of the other?

* Symbols are immutable, Strings are mutable
* Symbols are singletons, Strings are not

I think this is an example of the traditional OO dilemma: "is Circle a
subclass of Oval, or is Oval a subclass of Circle?" One argument says: a
Circle is a subclass of Oval because you can use an Oval to draw a
Circle -
you just need to constrain its parameters. Another argument says: an
Oval is
a subclass of Circle because it extends the behaviour of Circle.

Ruby says: we don't care. Make a Circle class, and make an Oval class.
Make
them both respond to whatever methods make sense (e.g. all shapes may be
expected to have a 'draw' method). If you want to share implementation
code
between them, then use a mixin.

Regards,

Brian.
Trans (Guest)
on 2007-05-12 19:20
(Received via mailing list)
On May 12, 9:27 am, Brian C. <removed_email_address@domain.invalid> wrote:
> > why the idea was not pursued any longer.
> subclass of the other?
> Ruby says: we don't care. Make a Circle class, and make an Oval class. Make
> them both respond to whatever methods make sense (e.g. all shapes may be
> expected to have a 'draw' method). If you want to share implementation code
> between them, then use a mixin.

There are a number of advantages to sub-classing that I can think of:

  1)  No need to do x.to_s.some_string_method.to_sym

  2)  Hash keys could efficiently equate symbol and string keys (it's
the distinction that should be optional)

  3)  It's conceptually simpler: a Symbol is an immutable String.

I'm sure there are a few more. On the downside, Symbols might not be
as efficient in general, and there could be some back-compatibility
issues.

Would be interesting to know what effectively killed the official
attempt at this.

T.
Rick D. (Guest)
on 2007-05-12 19:31
(Received via mailing list)
On 5/12/07, Brian C. <removed_email_address@domain.invalid> wrote:
> On Sat, May 12, 2007 at 04:20:10PM +0900, enduro wrote:
> > I was exited when I heard that
> > Symbol was made a subclass of String
> > in Ruby 1.9, September last year.
> >
> > But then I heard that the experiment
> > was stopped after only two months.

> you just need to constrain its parameters. Another argument says: an Oval is
> a subclass of Circle because it extends the behaviour of Circle.

More a dilemma with languages which force implementation inheritance
to track a notion of type inheritance.

Such languages assume that somehow type hierarchies are natural and
objective.  In reality they aren't.

Years ago I was discussing this with Brad Cox, and he came up with
another example.  In a strongly typed OO language you might have a
hierarchy like this:

   class Object
      class Vehicle < Object
         class Automobile < Vehicle
            class Car < Automobile
            class Truck < Automobile
                class Ambulance < Truck

So an ambulance is a specialized truck.

But then in a new context you might want to model a ski resort and now
an ambulance can be either a toboggan, or a helicopter.

These are the kind of things which tie strongly typed frameworks in
knots of implementation tangled with type.


> Ruby says: we don't care. Make a Circle class, and make an Oval class. Make
> them both respond to whatever methods make sense (e.g. all shapes may be
> expected to have a 'draw' method). If you want to share implementation code
> between them, then use a mixin.

Or in other words, languages like Ruby provide fairly rich mechanisms
for sharing implementation, and don't tangle this up with policy
decisions about how objects should be classified, which in the real
world can become messy, or at least context/requirements dependent.

If anyone wants to ponder the difficulties of making an objective
type/classification hierarchy in more depth, I'd recommend reading the
essay "What , if Anything, is a Zebra" by Stephen Jay Gould, or for a
more in-depth and challenging read, "Women, Fire, and Dangerous
Things" by George Lakoff
--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Rick D. (Guest)
on 2007-05-12 20:42
(Received via mailing list)
On 5/12/07, Trans <removed_email_address@domain.invalid> wrote:

>
> There are a number of advantages to sub-classing that I can think of:
>
>   1)  No need to do x.to_s.some_string_method.to_sym

Well, let's see. Why do we do symbol.to_s ?

    1). When we want a string representation of the symbol so that we
can say mutate it.  Subclassing won't help here.
    2) If we want to compare a string with a symbol.  Making Symbol a
subclass of string alone won't do this, and if we change Symbol#== so
that :a == "a" is true we destroy one of big advantages of Symbols
which is the speed of determining that two symbols are equal based on
their identity, this is why, for example, symbols rather than strings
are used as method selectors.

And why do we do string.to_sym, primarily because we want the speed
advantages of symbols in comparisons.

>   2)  Hash keys could efficiently equate symbol and string keys (it's
> the distinction that should be optional)

No I think that we'd actually get the worst here, it falls out of #2
above.  Symbol hash keys are more efficient than String hash keys
because of identity.


>   3)  It's conceptually simpler: a Symbol is an immutable String.
   No it isn't. A symbol is an object with an immutable string as a
name, and which is the sole instance with that name.

Now an interesting idea might be to add more string methods to Symbol
so that for example one could do

   :a + :b #=> :ab

So that there was more of a Symbol algebra which was still closed so
that the results were still Symbols.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Robert D. (Guest)
on 2007-05-13 11:50
(Received via mailing list)
On 5/12/07, Rick DeNatale <removed_email_address@domain.invalid> wrote:
> On 5/12/07, Trans <removed_email_address@domain.invalid> wrote:
<snip>
> Now an interesting idea might be to add more string methods to Symbol
> so that for example one could do

:a <=> :b and including Compareable automatically
I think that would be the most useful.
<snip>> Rick DeNatale
>
> My blog on Ruby
> http://talklikeaduck.denhaven2.com/
>
>
For the rest I rather agree with Rick's POV.
If one subeclasses a class X with a class Y, one conceptionally says
"an instance of Y" is "an instance of X".
Could you say a Symbol is a String? No you cannot unless a Symbol
responds to all messages of a String. In other words, subclasses
extend the behavior of baseclasses they never restrict it.
Well that holds for my old OO stuff I have learnt, maybe I have to
change paradigm, but right now I am not convinced.

Cheers
Robert
Xavier N. (Guest)
on 2007-05-13 12:22
(Received via mailing list)
Just wanted to point out that the original question is why Ruby core
changed their mind, not what people think in general about relating
String and Symbol. Perhaps the question could be sent to ruby-core as
well.

-- fxn
Robert D. (Guest)
on 2007-05-13 12:33
(Received via mailing list)
On 5/13/07, Xavier N. <removed_email_address@domain.invalid> wrote:
> Just wanted to point out that the original question is why Ruby core
> changed their mind, not what people think in general about relating
> String and Symbol. Perhaps the question could be sent to ruby-core as
> well.
That is indeed a good idea
>
> -- fxn
>
>
Neverheless we are spot on the thread, are we not? And even if we were
drifiting to a related topic that sometimes gives the best
discussions.

But maybe our arguments are not convincing?
What would you want to discuss then?

I do not feel one should be that rigid about OnTopic OffTopic.
Well just my 0.02 whatever money you worship.

Cheers
Robert
Xavier N. (Guest)
on 2007-05-13 13:05
(Received via mailing list)
On May 13, 2007, at 10:32 AM, Robert D. wrote:

> Neverheless we are spot on the thread, are we not? And even if we were
> drifiting to a related topic that sometimes gives the best
> discussions.
>
> But maybe our arguments are not convincing?

I think that if a couple of simple arguments make clear both classes
should be unrelated the core team wouldn't even bothered to start
relating them. So I guess it's likely that there's more into it and I
would like to know about it.

> What would you want to discuss then?
>
> I do not feel one should be that rigid about OnTopic OffTopic.
> Well just my 0.02 whatever money you worship.

The discussion itself is OK for me. I just wanted to point out that
the original question has not been answered, otherwise the thread
could engage in talking about what people think in general and forget
it.

-- fxn
Marcin R. (Guest)
on 2007-05-13 14:13
(Received via mailing list)
On Saturday 12 May 2007 07:20, enduro wrote:
> topic again and I've tried to collect the reasons
> Thank you very much
>
> Sven

basic reason - as stated in ruby hacking guide is that Symbol internally
is
just Int !

that makes hash based on symbols much much faster, as a consequence of
above
Symbol is "read-only" and you can modify String as much as you want.

so to sum up - Symbols are smaller, faster, but "read-only", good for
indexing
hashes - passing methods names etc.
Strings - heavy, slow, but with greater flexability,

if you want more in deep explenations read ruby internals/hacking guide
Yukihiro M. (Guest)
on 2007-05-13 20:07
(Received via mailing list)
Hi,

In message "Re: Why was the "Symbol is a String"-idea dropped?"
    on Sun, 13 May 2007 17:20:49 +0900, Xavier N. 
<removed_email_address@domain.invalid>
writes:

|Just wanted to point out that the original question is why Ruby core
|changed their mind, not what people think in general about relating
|String and Symbol. Perhaps the question could be sent to ruby-core as
|well.

We once changed Symbol as subclass of String to see how it goes, and
met many compatibility problems.  People distinguishes Symbols and
String far more than we expected.  So we just abandoned the idea.

              matz.
rett (Guest)
on 2007-05-13 22:38
(Received via mailing list)
Rick,

Aren't we confusing symbol with operator in this discussion. If I am
dealing
with a program as a string or group of strings, as any compiler
initially must,
not having symbols as a part of strings makes my initial task almost
impossible.

Everett L.(Rett) Williams II
Robert D. (Guest)
on 2007-05-14 00:10
(Received via mailing list)
On 5/13/07, Xavier N. <removed_email_address@domain.invalid> wrote:
> >>
> >>
> > Neverheless we are spot on the thread, are we not? And even if we were
> > drifiting to a related topic that sometimes gives the best
> > discussions.
> >
> > But maybe our arguments are not convincing?
>
> I think that if a couple of simple arguments make clear both classes
> should be unrelated the core team wouldn't even bothered to start
> relating them.
I have the highest respect for the community that works on Ruby2.0.
That however does not make them gods, and they can therefore err.
On one hand I do not bother with the consideration why the have
thought about it when we discussed technical issues - for right or
wrong.

However and I thank you for pointing this out (and reexplaining it,
because I can be quite stubborn (pourquoi penses-tu que je suis marié
avec une Bretonne;) they might indeed have had some conceptional ideas
that might be interesting.
This would kill the idea of symbols in the general sense (Smalltalk,
Lisp and Ruby1.8), maybe this was what made them back off?

Sorry if I was slightly aggressive but I still feel that you post was
a little it too severe with us ;).
No the slight misunderstanding came from my failure to understand what
you wanted to say, my fault without doubt.
> would like to know about it.
>
> > What would you want to discuss then?
That was a stupid question of YHS, I know now, what you wanted to talk
about :)
> >
> > I do not feel one should be that rigid about OnTopic OffTopic.
> > Well just my 0.02 whatever money you worship.
>
> The discussion itself is OK for me. I just wanted to point out that
> the original question has not been answered, otherwise the thread
> could engage in talking about what people think in general and forget
> it.
Sure but I still have a much more relaxed POV about this, but please
believe me I respect yours too.
>
> -- fxn
>
>
>
Cheers
Robert
Trans (Guest)
on 2007-05-14 20:57
(Received via mailing list)
On May 13, 12:07 pm, Yukihiro M. <removed_email_address@domain.invalid> wrote:
> We once changed Symbol as subclass of String to see how it goes, and
> met many compatibility problems.  People distinguishes Symbols and
> String far more than we expected.  So we just abandoned the idea.

That's unfortunate. IMHO it's generally bad practice to functionally
differentiate between them. But this being the official status now, I
don't see any reason to accept string hash keys for method options
anymore. It's just not worth the extra code and computation to do so.

T.
enduro (Guest)
on 2007-05-15 05:07
(Received via mailing list)
Thank you all for your replies.

And thank you, Xavier, for keeping the focus on my original intention.
Yes, I was not asking about general arguments for designing a class
hierarchy, but for the reasons for this particular decision of the
ruby-core team.

And I was indeed enlightened by matz's answer:

>>|String and Symbol. Perhaps the question could be sent to ruby-core as
>>|well.
>>
>>We once changed Symbol as subclass of String to see how it goes, and
>>met many compatibility problems.  People distinguishes Symbols and
>>String far more than we expected.  So we just abandoned the idea.
>>
>>
This tells me, that it was mainly the weight of the existing ruby usage,
that flipped the balance towards the conservative side.

Or, in other words: if the decision to unify Symbol and String would
have been taken at early stages of Ruby development, then the
general usage would have adapted to this, and ...
we might be happier with the result today.

At least, that is my private opinion on this question:

It is tempting to say: "Symbols are just integers internally,
they are just isolated points in 'Symbol-space',
so it is not suitable to give them string methods."
But I think in practice this is not true:
- Symbols are a standard data type for meta-programming
  (and immediately, there will be a need to append a "?" here and then,
  or test for some regexp-condition...)
- Symbols are fast as Hash keys,
  but the "real-world" keys often are strings, or even can be both,
  and then the current situation creates the well-known dilemma
  to decide for a Symbol/String interface (and implement it).
  Yes, this gives us programmers the freedom to optimize the code...
  (... but I think a standard solution would serve better in most
cases.)

Yes, I sometimes think of that separation of Symbol from String
as a tiny impurity in the Ruby crystal.

I thought Ruby 2.0 could have been a chance to iron this out.
But it seems that now only small changes are still possible.

So, I'll just have to come to terms with it. :-)
(And I will, of course -- there are enough other fascinating issues...
:-) )

Along the lines of Trans:

>That's unfortunate. IMHO it's generally bad practice to functionally
>differentiate between them. But this being the official status now, I
>don't see any reason to accept string hash keys for method options
>anymore. It's just not worth the extra code and computation to do so.
>
>T.
>
>

Before I close, just a small thought regarding the issue that
subclasses are usually extended from their superclass, and not
restricted.
I don't know if that had been discussed before: would it perhaps be good
to
create a class hierarchy similar to the Float/Integer hierarchy?
String < Stringlike
Symbol < Stringlike
Of course with everything set up such that hash[:a] is the same as
hash["a"] etc.
(Just a thought, probably this already has been rejected.)

Anyway, I'd like to thank the core programmers for all the work
they have put into Ruby to make it shine.
Kind regards,
Sven
Brian C. (Guest)
on 2007-05-15 11:31
(Received via mailing list)
On Tue, May 15, 2007 at 10:07:24AM +0900, enduro wrote:
>  to decide for a Symbol/String interface (and implement it).
The programs for which it makes sense to convert strings (received from
some
external source, e.g. a database) to symbols for optimisation purposes,
i.e.
where the benefits are measurable, will be pretty few. And you also open
yourself to a symbol exhaustion denial-of-service.

That is, as far as I know, the symbol table is never garbage collected.
Once
a symbol, always a symbol.

So using literal symbols as hash keys makes sense:

  { :foo=>1, :bar=>2 }

but using

  h = {}
  h[a.to_sym] => 1

is risky, and unlikely to yield measurable benefit. If 'a' is already a
String, then there is no benefit from avoiding object creation, since
it's
been already done. So you may as well leave it as a String.

> Yes, I sometimes think of that separation of Symbol from String
> as a tiny impurity in the Ruby crystal.

I would disagree with you there, because Symbols are clean and easy to
understand.

There are other "impurities" I can think of - like the seven or so
different
flavours of Proc object which have subtle different semantics. This I
find
more difficult, because it's really hard to remember the rules for how
they
differ. But things like this are here to make the language "do the right
thing" in most practical cases. And, once you've used Ruby for a while,
you
find that actually it does.

> I thought Ruby 2.0 could have been a chance to iron this out.
> But it seems that now only small changes are still possible.

I'd vote strongly against anyway. I *like* Symbols as they are. I also
don't
feel a dichotomy. Use a symbol where necessary (i.e. for method names)
and
for literal hash keys, e.g. named arguments. For anything else a string
is
just fine.

I agree it's a bit annoying when you come across a bit of code which
violates the standard practice: e.g. net/telnet uses
    { "Prompt" => /foo/ }
instead of
    { :prompt => /foo/ }

But then even :Prompt would have been annoying, because generally people
don't use the capitalisation either.

Do you think that hash['a'] and hash['A'] should be the same?

Regards,

Brian.
Robert K. (Guest)
on 2007-05-15 13:35
(Received via mailing list)
On 15.05.2007 03:07, enduro wrote:
>>> We once changed Symbol as subclass of String to see how it goes, and
>>> met many compatibility problems.  People distinguishes Symbols and
>>> String far more than we expected.  So we just abandoned the idea.
>>>
> This tells me, that it was mainly the weight of the existing ruby usage,
> that flipped the balance towards the conservative side.

Which is not a bad thing in itself.

> Or, in other words: if the decision to unify Symbol and String would
> have been taken at early stages of Ruby development, then the
> general usage would have adapted to this, and ...
> we might be happier with the result today.

I am in no way unhappy with the way it is today.  Strings and symbols
serve different purposes although there is some overlap.  I rarely feel
the need to convert between the two.

> - Symbols are fast as Hash keys,
>  but the "real-world" keys often are strings, or even can be both,
>  and then the current situation creates the well-known dilemma
>  to decide for a Symbol/String interface (and implement it).

I am not aware of a situation where you would need to mix them as hash
keys.  And to make the distinction is pretty easy most of the time IMHO.

>  Yes, this gives us programmers the freedom to optimize the code...
>  (... but I think a standard solution would serve better in most cases.)

Frankly, I believe there is an inherent advantage that you can use
symbols vs. strings in code.  And I mean not only performance wise but
also readability wise.

Note though, that all these issues have nothing to do with the question
whether String and Symbol should be connected inheritance wise.  IMHO
that's mostly an implementation decision in Ruby.

> Yes, I sometimes think of that separation of Symbol from String
> as a tiny impurity in the Ruby crystal.

Personally I believe it creates more expressiveness.  If you view this
as impurity, there are a lot of them in Ruby because Ruby's focus has
always been on pragmatism and not purity (although it goes pretty far in
some areas, for example it avoids the POD vs. object distinction that
Java has (I would say this is a pragmatic decision because it makes
things easier if you have a common base class for *all* types)).

> I thought Ruby 2.0 could have been a chance to iron this out.
> But it seems that now only small changes are still possible.

 From what I gather Ruby 2.0 will have some major changes, for example
in the area of scoping.  Though it's probably done in a way that it will
reduce the impact on existing programs.

> So, I'll just have to come to terms with it. :-)
> (And I will, of course -- there are enough other fascinating issues...
> :-) )

The capability to adjust to reality is a useful one IMHO. :-)

> Before I close, just a small thought regarding the issue that
> subclasses are usually extended from their superclass, and not restricted.
> I don't know if that had been discussed before: would it perhaps be good to
> create a class hierarchy similar to the Float/Integer hierarchy?
> String < Stringlike
> Symbol < Stringlike

Why not?  StringLike could even be a module that relies solely on [] and
length to do all the non mutating stuff.

> Of course with everything set up such that hash[:a] is the same as
> hash["a"] etc.
> (Just a thought, probably this already has been rejected.)

I'm not sure whether this is a good idea.  Given the fact that I don't
mix symbols and strings as Hash keys I wouldn't benefit - but it would
not hurt me either. :-)  YMMV

> Anyway, I'd like to thank the core programmers for all the work
> they have put into Ruby to make it shine.

Definitively!  Credits also go to the community that is still among the
most civilized online communities I know so far!

Kind regards

  robert
enduro (Sven S.) (Guest)
on 2007-05-15 13:42
(Received via mailing list)
Hello Brian,

Brian C. wrote:

>external source, e.g. a database) to symbols for optimisation purposes, i.e.
>where the benefits are measurable, will be pretty few.
>
Yes, I agree.
(That's what I tried to address by the two lines after the quote above,
perhaps I should have put a smiley in there :-) )

>And you also open yourself to a symbol exhaustion denial-of-service.
>
>
Yes, of course.
But my point is: Let the system take care of that.
I want a Ruby that just works - crystal-clear, transparently, reliably.
:-)
And it already does in most cases. And there is a lot that can be
improved.
And one such improvements could be a garbage collection for symbols. (I
think.)

>That is, as far as I know, the symbol table is never garbage collected. Once
>a symbol, always a symbol.
>
>
I'm not a core programmer, maybe i am asking to much,
but I think it should be possible without slowing anything down.
One very simple idea I can think of, is the following:
Set a limit to the number of symbols and if it is reached
the GC wil be invoked in a special symbol-mode, marking all symbols that
are
still in use and completely re-generates the symbol-table from scratch.


>>Yes, I sometimes think of that separation of Symbol from String
>>as a tiny impurity in the Ruby crystal.
>>
>>
>
>I would disagree with you there, because Symbols are clean and easy to
>understand.
>
>
Yes, I really must admit, I also like the cleanness of current Symbols.
But then, my experience is that this clearness is not worth a lot,
because the border towards "dirty" strings must be crossed often.
(That's why I called sticking to the clearness "temping" in my last
post.)


>There are other "impurities" I can think of - like the seven or so different
>flavours of Proc object which have subtle different semantics. This I find
>more difficult, because it's really hard to remember the rules for how they
>differ.
>
Fully agree! But that must be a different thread.

>But things like this are here to make the language "do the right
>thing" in most practical cases. And, once you've used Ruby for a while, you
>find that actually it does.
>
>
OK. But that can be said for most programming languages.
We are dealing with Ruby here,
and the appealing thing of Ruby is: the language!
I mean: concise syntax, flexiblity, expressiveness,
allowing to express things *naturally*.
Ruby is not yet good in many other aspects:
speed, threads, documentation.
But the runtime engine can be improved with time,
documentation can grow.
The language is the crystal. It must be good in the beginning,
it becomes more solid with every project written in that language.

So, I'd like to use the time we still have before Ruby 2 is born,
to contribute to a really good language.

>Do you think that hash['a'] and hash['A'] should be the same?
>
>
No, not for the builtin Hash#[].



So long

Sven
Robert D. (Guest)
on 2007-05-15 13:43
(Received via mailing list)
On 5/14/07, Trans <removed_email_address@domain.invalid> wrote:
> > |String and Symbol. Perhaps the question could be sent to ruby-core as
>
> T.
>
I really like the idea of using symbols as parameter keys exclusively,
I think we would get closer to named parameters instead of emulating
them.
And the interesting stuff is, I always hated String keys in parameter
hashes.
Does this go together with the fact that I really like the good old
Symbols are not Strings paradigm? Probably.

But right to now I fail to see what would be the gain from making
symbols mutable.
I still maintain the POV that immutable Symbols must not be a subclass
of String.

Any more thoughts?

Cheers
Robert
Robert D. (Guest)
on 2007-05-15 13:51
(Received via mailing list)
On 5/15/07, enduro <removed_email_address@domain.invalid> wrote:
> Thank you all for your replies.
>
> And thank you, Xavier, for keeping the focus on my original intention.
> Yes, I was not asking about general arguments for designing a class
> hierarchy, but for the reasons for this particular decision of the
> ruby-core team.
I really have not taken offense. However if you are interested in that
only you might post to ruby-core only.
I am kind of surprised that the considerations of Rick and YHS are
considered as OT.
If you do not like them maybe it would be polite to ignore them. But
talking about the topic on *this* list and ignoring all background
information about what symbols are and have been is kind of weird.
Please remember that Ruby has its inheritance in other languages
owning symbols as I believe to have pointed out.
The fact that the original idea is a big paradigm shift does not
answer your question?

I honestly do not understand that.

Threads just evolve I do not feel that they belong to OP :).
They do not belong to me either of course ;).
Cheers
Robert
enduro (Sven S.) (Guest)
on 2007-05-15 14:32
(Received via mailing list)
Robert K. schrieb:

> On 15.05.2007 03:07, enduro wrote:

>> Or, in other words: if the decision to unify Symbol and String would
>> have been taken at early stages of Ruby development, then the
>> general usage would have adapted to this, and ...
>> we might be happier with the result today.
>
> I am in no way unhappy with the way it is today.
> Strings and symbols serve different purposes although there is some
> overlap.
> I rarely feel the need to convert between the two.

I see.
And I am quite surprised. Because judging from your online activity
you seem to have some experience.
Perhaps it is also my programming style: I may use symbols where one
normally would use strings.


> I am not aware of a situation where you would need to mix them as hash
> keys.
> And to make the distinction is pretty easy most of the time IMHO.

Not aware? I mean Rails mixes them, right?


> Frankly, I believe there is an inherent advantage that you can use
> symbols vs. strings in code.
> And I mean not only performance wise but also readability wise.

Readability-wise: precisely what advantage?
The only thing that comes to my mind just now, is
that a separated Symbol class easily provides
distinct special values for a parameter that would normally carry a
String.

> Note though, that all these issues have nothing to do with the question
> whether String and Symbol should be connected inheritance wise.
> IMHO that's mostly an implementation decision in Ruby.

Yes, I agree.
I am actually interested in the implications for the programmer.
My original question just arised out of the notion
that this implementation decision could have been a move
in a (to my mind) favourable direction.


>> Yes, I sometimes think of that separation of Symbol from String
>> as a tiny impurity in the Ruby crystal.
>
>
> Personally I believe it creates more expressiveness.
> If you view this as impurity, there are a lot of them in Ruby because
> Ruby's focus
> has always been on pragmatism and not purity

1.  The  core structure must of course be large enough,  and a  large
structure may  look impure.
2.  But regarding this particular question: My original notion was that
keeping
     Symbol and String too separate is not pragmatic.
     (I may change my mind on that, if I read more posts like yours,
though.)

>> So, I'll just have to come to terms with it. :-)
>> (And I will, of course -- there are enough other fascinating
>> issues... :-) )
>
> The capability to adjust to reality is a useful one IMHO. :-)

Well, yes, sometimes I'm glad someone tells me that. :-)


>> create a class hierarchy similar to the Float/Integer hierarchy?
>> String < Stringlike
>> Symbol < Stringlike
>
> Why not?  StringLike could even be a module that relies solely on []
> and length to do all the non mutating stuff.

Ah, interesting. Can't follow the implications right now.


> Given the fact that I don't mix symbols and strings as Hash keys I
> wouldn't benefit -
> but it would not hurt me either. :-)  YMMV

Yes that was the idea behind it: to benefit some and not to hurt the
others.


> Credits also go to the community that is still among the most
> civilized online communities I know so far!

Indeed, I'm experiencing it right now!
Thanks a lot!

Sven
Robert K. (Guest)
on 2007-05-15 15:00
(Received via mailing list)
On 15.05.2007 12:31, enduro (Sven S.) wrote:
>> serve different purposes although there is some overlap. I rarely feel
>> the need to convert between the two.
>
> I see.
> And I am quite surprised. Because judging from your online activity
> you seem to have some experience.
> Perhaps it is also my programming style: I may use symbols where one
> normally would use strings.

Yeah, maybe.  So where are you using symbols where one normally would
use strings?

>> I am not aware of a situation where you would need to mix them as hash
>> keys. And to make the distinction is pretty easy most of the time IMHO.
>
> Not aware? I mean Rails mixes them, right?

I don't use Rails. :-)))

>> Frankly, I believe there is an inherent advantage that you can use
>> symbols vs. strings in code. And I mean not only performance wise but
>> also readability wise.
>
> Readability-wise: precisely what advantage?

If I see a symbol being used as a Hash key I immediately know (or rather
guess) that there is only a limited amount of them and they are known
beforehand, like with options.

# silly example
opts = {
   :length => 12,
   :width => 30,
}
# other code
resize( opts[:length] )

Whereas when strings are used it's typically stuff that is read from
somewhere, like (another silly example):

ruby -aF: -ne 'BEGIN { $c=Hash.new(0) }; $c[$F[1]]+=1; END { $c.each
{|k,v| print k, "=", v, "\n"}}' /etc/passwd

> The only thing that comes to my mind just now, is
> that a separated Symbol class easily provides
> distinct special values for a parameter that would normally carry a String.

Don't forget the optical distinction between using 'string', "string"
and :symbol.

>> Note though, that all these issues have nothing to do with the question
>> whether String and Symbol should be connected inheritance wise. IMHO
>> that's mostly an implementation decision in Ruby.
>
> Yes, I agree.
> I am actually interested in the implications for the programmer.
> My original question just arised out of the notion
> that this implementation decision could have been a move
> in a (to my mind) favourable direction.

As we all have different habits what may be favorable for one may be
regrettable for the other. :-)

>>> Yes, I sometimes think of that separation of Symbol from String
>>> as a tiny impurity in the Ruby crystal.
>>
>> Personally I believe it creates more expressiveness. If you view this
>> as impurity, there are a lot of them in Ruby because Ruby's focus
>> has always been on pragmatism and not purity
>
> 1.  The  core structure must of course be large enough,  and a  large
> structure may  look impure.

This somehow reminds me of
http://en.wikipedia.org/wiki/G%C3%B6del%27s_incomp...

> 2.  But regarding this particular question: My original notion was that
> keeping
>     Symbol and String too separate is not pragmatic.
>     (I may change my mind on that, if I read more posts like yours,
> though.)

Just reread mine a few times - then you don't need the other postings
any more.  That's more efficient - you'll save bandwidth and reading is
actually faster if you know the text already.  :-))

>>> So, I'll just have to come to terms with it. :-)
>>> (And I will, of course -- there are enough other fascinating
>>> issues... :-) )
>>
>> The capability to adjust to reality is a useful one IMHO. :-)
>
> Well, yes, sometimes I'm glad someone tells me that. :-)

:-))  No sweat - following visions is useful as well.  As always it's
the mix...

>>> create a class hierarchy similar to the Float/Integer hierarchy?
>>> String < Stringlike
>>> Symbol < Stringlike
>>
>> Why not?  StringLike could even be a module that relies solely on []
>> and length to do all the non mutating stuff.
>
> Ah, interesting. Can't follow the implications right now.

For example regexp matching might be implemented similarly for both
(i.e. just in one place).  But then again, since RX functionality is
highly integrated into the language that might not be a good idea - or
the C code needs to become more complex to react differently if it sees
a String or Symbol vs. some custom class that includes this module.
Hm...

>> Given the fact that I don't mix symbols and strings as Hash keys I
>> wouldn't benefit -
>> but it would not hurt me either. :-)  YMMV
>
> Yes that was the idea behind it: to benefit some and not to hurt the
> others.

The next best thing to a win win situation. :-))

>> Credits also go to the community that is still among the most
>> civilized online communities I know so far!
>
> Indeed, I'm experiencing it right now!
> Thanks a lot!

You're welcome.  Thank /you/!

Kind regards

  robert
enduro (Sven S.) (Guest)
on 2007-05-15 15:07
(Received via mailing list)
Ooops!

sorry if I came across rude in any way.

I don't want to "own" the thread.
But I am interested in my question,
so I was glad that someone repeated it,
at a time when all the answers up to that point had not yet answered it.

Robert D. schrieb:

> The fact that the original idea is a big paradigm shift does not
> answer your question?

Sorry, no. If someone had told me that this fact was the basis for the
decision of the core team,
that would have answered my question.
(Because the fact alone is not compelling: If a paradigm shift is
possible and good then why not shift?)

And also, I thought that this was the right place for posting the
question.
(Actually, until yesterday I didn't know that I could post on ruby-core,
I thought it was just for "cracks", because it's read-only on
ruby-forum.com)

Kind Regards
Sven


And here again, Robert D.'s full text:

> only you might post to ruby-core only.
> I honestly do not understand that.
>
> Threads just evolve I do not feel that they belong to OP :).
> They do not belong to me either of course ;).
> Cheers
> Robert


Another question:
Who is

> YHS

?

Regards, Sven
enduro (Sven S.) (Guest)
on 2007-05-15 15:50
(Received via mailing list)
Hello again,

Robert K. schrieb:

> On 15.05.2007 12:31, enduro (Sven S.) wrote:
>
>> Perhaps it is also my programming style: I may use symbols where one
>> normally would use strings.
>
> Yeah, maybe.  So where are you using symbols where one normally would
> use strings?

Let me guess, because I don't know if I am really the only one:
1. Multipurpose-names:
  Like option-names, used as hash keys but also as names and labels for
the corresponding graphics control etc.
2. Logging:
  Giving a brief hint in the form of a symbol (not the log level), well
just because it is easier to type and looks nice

>> Not aware? I mean Rails mixes them, right?
>
> I don't use Rails. :-)))

Oops :-), offending agian, am I? :-)
:-)


> like with options.
>
> # silly example
> opts = {
>   :length => 12,
>   :width => 30,
> }
> # other code
> resize( opts[:length] )

Sorry, don't get me wrong:

I DID NOT MEAN TO REMOVE the Symbol class.
Nor Symbol literals.

Thus, your examples would be valid and semantically equivalent code
after a "unification" of the classes (regardless if Symbol < String or
not).
Or I'd better not call it "unification", I don't have a good word,
perhaps "joining" would be better.

> Don't forget the optical distinction between using 'string', "string"
> and :symbol.

Also, this won't be affected, see above.


>>> [...] on pragmatism and not purity
>>
>> 1.  The  core structure must of course be large enough,  and a  large
>> structure may  look impure.
>
> This somehow reminds me of
> http://en.wikipedia.org/wiki/G%C3%B6del%27s_incomp...

... mystery will always remain ...


>> 2.  But regarding this particular question: My original notion was
>> that keeping
>>     Symbol and String too separate is not pragmatic.
>>     (I may change my mind on that, if I read more posts like yours,
>> though.)
>
> Just reread mine a few times - then you don't need the other postings
> any more.
> That's more efficient - you'll save bandwidth and reading is actually
> faster if you know the text already.  :-))

Well,
as Ruby-users,
we don't sacrifice our fun to the god of efficiency, do we... :-)


Cheers,
Sven
Robert D. (Guest)
on 2007-05-15 15:55
(Received via mailing list)
On 5/15/07, enduro (Sven S.) <removed_email_address@domain.invalid> wrote:
>
> > The fact that the original idea is a big paradigm shift does not
> > answer your question?
>
> Sorry, no. If someone had told me that this fact was the basis for the
> decision of the core team,
> that would have answered my question.
> (Because the fact alone is not compelling: If a paradigm shift is
> possible and good then why not shift?)

Sure that was exactly the thing I wanted to discuss and suddenly
someone told me hey stay On Topic. That was strange but not rude at
all. I mean neither Xavier nor you, you are very civilized and polite
people- maybe much more than YHS ;)
I just had the feeling that the answers you will get on this list will
never correspond to your exact question, and I was wrong as Matz
stepped by.

I admit that personally I have a big problem with "A symbol is a
string", but brighter people than me like Tom and Matz have not or did
not have, so maybe indeed I am making too much noise while thinking
:(.

But please remember too that there are only complicated answers to
simple questions ;).

>
> And also, I thought that this was the right place for posting the question.
> (Actually, until yesterday I didn't know that I could post on ruby-core,
> I thought it was just for "cracks", because it's read-only on
> ruby-forum.com)
I definitely should have pointed that out first and than I could have
taken all the time to rant/argue/discuss the technical points, oh boy
how difficult communication can be sometimes!
>
> Kind Regards
> Sven
>
Cheers
Robert
Brian C. (Guest)
on 2007-05-15 17:06
(Received via mailing list)
On Tue, May 15, 2007 at 06:42:04PM +0900, enduro (Sven S.) wrote:
> >And you also open yourself to a symbol exhaustion denial-of-service.
> >
> >
> Yes, of course.
> But my point is: Let the system take care of that.
> I want a Ruby that just works - crystal-clear, transparently, reliably.
> :-)
> And it already does in most cases. And there is a lot that can be improved.
> And one such improvements could be a garbage collection for symbols. (I
> think.)

But then what you want are not symbols, but true immutable strings. By
that
I mean: some object where I can write 10MB of binary dump. If I want to
add
one character to the end of it, then I create another object containing
10MB+1byte of binary dump, and the old 10MB object is garbage-collected.

Now, there have been arguments that *all* strings in Ruby should have
been
immutable in the first place, and I can sympathise with them. After all,
numbers are immutable, and so are certain other classes. But
pragmatically,
there are cases where it is just so *useful* to append to a string.
Besides,
maintaining the singleton property is hard for large binary objects -
i.e.
when I create another 10MB binary dump, I have to check whether it's the
same as any other object which already exists.

(And of course, very large numbers are Bignums, which are not
singletons)

> >That is, as far as I know, the symbol table is never garbage collected.
> >Once
> >a symbol, always a symbol.
> >
> I'm not a core programmer, maybe i am asking to much,
> but I think it should be possible without slowing anything down.
> One very simple idea I can think of, is the following:
> Set a limit to the number of symbols and if it is reached
> the GC wil be invoked in a special symbol-mode, marking all symbols that are
> still in use and completely re-generates the symbol-table from scratch.

Yes, but why??? In real life, real world programs, only a few hundred
unique
method names are used. So let them be symbols.

If you are going to create a million different symbols, or symbols which
are
millions of bytes long, then use a String. That's what they are there
for!

"Doctor, it hurts when I do this" -- "Then don't do that!"

What you seem to be saying is "I don't want there to be two different
types
of object, one for method names and one for holding blobs of data", but
I
don't understand this. Symbols work, are fast, and personally I find
them
aesthetically pleasing: one is a sort of tag for method names, and one
is a
holder of blobs of data which may come from the outside world or from my
own
computations.

> Yes, I really must admit, I also like the cleanness of current Symbols.
> But then, my experience is that this clearness is not worth a lot,
> because the border towards "dirty" strings must be crossed often.
> (That's why I called sticking to the clearness "temping" in my last post.)

I don't think so. The examples I've seen so far are:

(1) Method names which are created algorithmically. That is, you know
you
have a method called "foo" and you want to call another method called
"foo=". It works, where's the problem?

    send("#{mname}=")

Yes, you've made a conversion to a string, and back again. Big deal. The
only way to improve this would be to have symbol algebra, e.g.
    (:foo + :=) == :foo=

But internally it would almost certainly be implemented the same way,
because you'd have to look up the symbol ID to convert it into its
character
representation, manipulate the characters, and then lookup back into a
symbol.

Or, you'd have to drop symbols entirely and make *every* method call use
a
string of characters as the method name - which would be very expensive.

Or, you'd have to make all Strings immutable, so that the the string ID
could be used as a method call tag. See above for reasons why that is
undesirable.

(2) Rails, which allows you to be inconsistent between :foo=>:bar and
:foo=>"bar" and "foo"=>:bar and "foo"=>"bar" (at least sometimes - not
always). IMO it would have been better if Rails had stuck to one or the
other, but that's too late to undo.

Rails has introduced its own bast^H^H^H^Hextensions to the language
anyway.

> Ruby is not yet good in many other aspects:
> speed, threads, documentation.

There is really *excellent* documentation for Ruby. You have to pay for
it,
but the books I am thinking of are well worth the money.

You may not like the idea that the language designer and contributors
are
not getting any money directly for their work, whilst book publishers
are. I
can live with that.

I find that speed is good enough, and threads are better than most (have
you
tried writing threaded programs in Perl?)

> The language is the crystal. It must be good in the beginning,
> it becomes more solid with every project written in that language.

Many people don't seem to realise that Ruby is, what, 15 years old now?

Regards,

Brian.
Robert D. (Guest)
on 2007-05-15 17:55
(Received via mailing list)
On 5/15/07, Brian C. <removed_email_address@domain.invalid> wrote:
<snip>
> But then what you want are not symbols, but true immutable strings. By that
> I mean: some object where I can write 10MB of binary dump. If I want to add
> one character to the end of it, then I create another object containing
> 10MB+1byte of binary dump, and the old 10MB object is garbage-collected.
But of course we have immutable strings already :)))

class IString < String
   def initialize str
     super(str)
     freeze
   end
end

HTIOI (Hope this is of interest ;)
>
<snip>
Cheers
Robert
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw
Robert K. (Guest)
on 2007-05-15 18:11
(Received via mailing list)
On 15.05.2007 15:54, Robert D. wrote:
> class IString < String
>   def initialize str
>     super(str)
>     freeze
>   end
> end

What advantages does this have over using "freeze" directly?

str = "foo".freeze

It seems using a new class will increase the likelihood of things to
break.

> HTIOI (Hope this is of interest ;)

LOL

> You see things; and you say Why?
> But I dream things that never were; and I say Why not?
> -- George Bernard Shaw

Greetings to George, btw. :-)

  robert
Rick D. (Guest)
on 2007-05-15 18:29
(Received via mailing list)
Not responding to any particular posting.

One of the false memes that some folks on this thread seem to hold is
that Symbols are integers.

They aren't.

Any more than they are strings.

A given ruby symbol has both a string and an integer representation,
which can be obtained by using the to_s, and to_i But one would't say
that the object 1.2 is a string because it has a string
representation, or that the object "123" was an integer because it has
an integer representation.

The essential fact about symbols is that if two symbols have the same
string representation they are the same object, and that two different
symbols have two different integer representations. Or more formally

     sym1.to_s == sym2.to_s  iff sym1.object_id == sym2.object_id
     sym1.to_i == sym2.to_i iff sym1.object_id == sym2.object_id

One way to implement this is to keep internal tables which map the
string and integer representations of symbols to each other, and to
have functional mappings between the object_ids and integer
representations of symbols.  This is how ruby does it.  Creating a
symbol from a string consists of looking for the string in the mapping
from strings to integer representations, and if it's not found
assigning the next integer rep and adding the string and integer rep
to the internal tables.  This operation, called interning, happens
either at parse time when :foo is encountered, or later when an
expression like 'foo'.to_sym is executed.

The meme that "Symbols are Integers" probably lingers from an earlier
version of Ruby before there was an actual Symbol class.  Back then,
symbols really were instances of Fixnum, but no more.  This lives on
vestigially in that Symbol does have a to_int method as well as to_i,
but to_int is deprecated, using it produces a warning :

rick@frodo:~$ ruby -w -e"p :sym.to_int"
-e:1: warning: treating Symbol as an integer
10409

while to_i does not.
rick@frodo:~$ ruby -w -e"p :sym.to_i"
10409

Other languages, like Smalltalk, with similar concepts don't associate
integer representations with Symbols, in these languages the internal
mapping simply maps string representations to object id's, or to the
symbol objects themselves. I suspect that this feature of Ruby symbols
is simply due to the earlier implementation.

Now what are the useful properties of Symbols:

    1. Detecting whether or not two symbols are equal is as fast as
comparing their object_ids.  This is an O(1) operation.
 Detecting whether or not two strings are equal requires a scan of
both strings until either an unequal character is found or the end of
both strings is reached. This is an O(n) operation.
    2. Having 1000 'instances' of a symbol with a particular string
representation takes no more space than having 1

Property 1 means that things like hashes with symbol keys are somewhat
faster than hashes with string keys.  This is why symbols are used as
method selectors, since dispatching a method call requires repeated
lookup in the method tables going up the inheritance chain.  This is a
win if the key is looked up multiple times, there is an initial cost
of interning the symbol (which essentially consists of looking for the
string representation in an internal global symbol table) but this
cost is amortized over subsequent lookups.

It seems that the HashWithIndifferentAccess class added by Rails in
ActiveSupport, which allows symbols and strings to be used
interchangeably as keys, doesn't actually take advantage of this since
it uses symbols converted to strings as the actual keys rather than
the other way around.  This provides a bit of syntactic sugar, without
getting either the performance or space advantages of using symbols.

As for incompatibilies caused by the experiment, I'm not sure exactly
what Matz and the core team ran into but certainly this would break
code like:

case arg
when String
    # do something
when Symbol
    # do something else
end

Code like this exhibits the fragility of doing discrimination based on
classes in the face of refactoring.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Robert D. (Guest)
on 2007-05-15 18:35
(Received via mailing list)
On 5/15/07, Robert K. <removed_email_address@domain.invalid> wrote:
> >
> > class IString < String
> >   def initialize str
> >     super(str)
> >     freeze
> >   end
> > end
>
> What advantages does this have over using "freeze" directly?

Dunno :)

x = IString.new("Hello World") # Not even tested yet
vs.
x="HelloWorld".freeze

Well the first one has the advantage that I thought about it ;)

Now I reckon that the subclass stuff is baaad

def blah str
    raise ArgumentError unless IString === str
    ...
end

but now someone does
class MString < IString
   get rid of the freeze (by calling superclass.superclass.new in
self.class.new e.g)
end

and my code is broken, while in


def blah str
    raise ArgumentError unless str.respond_to? :frozen && str.frozen?
   ...
end

frozen is frozen forever.

So do what Robert told you and beware of what Robert told you;)

> > But I dream things that never were; and I say Why not?
> > -- George Bernard Shaw
>
> Greetings to George, btw. :-)
Well last time I met him he was admiring your posts to the list :-P
>
>         robert
>
>
idem
Robert D. (Guest)
on 2007-05-15 18:42
(Received via mailing list)
On 5/15/07, Rick DeNatale <removed_email_address@domain.invalid> wrote:
<lots of interesting stuff snipped>
>
> It seems that the HashWithIndifferentAccess class added by Rails in
> ActiveSupport, which allows symbols and strings to be used
> interchangeably as keys, doesn't actually take advantage of this since
> it uses symbols converted to strings as the actual keys rather than
> the other way around.  This provides a bit of syntactic sugar, without
> getting either the performance or space advantages of using symbols.
Is this whole String vs. Symbol idea motivated by Rails stuff?
I just do not know Rails but I would guess it is a dangerous thing if
paradigms that are useful in an application framework - even if it is
such a Great One as Rails - are to be applied to a General Purpose
Language.

I will rephrase OP's question now, why the h[ae]ck did the Core team
think about unifying Strings and Symbols in the first place ???
That is for sure something very interesting.
<more stuff snipped>

Robert
Brian C. (Guest)
on 2007-05-15 18:54
(Received via mailing list)
On Tue, May 15, 2007 at 10:54:05PM +0900, Robert D. wrote:
>     super(str)
>     freeze
>   end
> end

Yes, but it's not a singleton.

It would only be of interest as a Symbol replacement if
IString.new("foo")
always returned the same object. You could implement this using the
Multiton
pattern I think.

Then you could safely use IString#object_id as a method name key.

Regards,

Brian.
Brian C. (Guest)
on 2007-05-15 19:04
(Received via mailing list)
On Tue, May 15, 2007 at 11:53:08PM +0900, Brian C. wrote:
> Yes, but it's not a singleton.
>
> It would only be of interest as a Symbol replacement if IString.new("foo")
> always returned the same object. You could implement this using the Multiton
> pattern I think.
>
> Then you could safely use IString#object_id as a method name key.

P.S. I'm aware of Symbol#to_i, but to_i and object_id appear to be
intimately related:

irb(main):001:0> :foo.to_i
=> 14817
irb(main):002:0> :foo.object_id
=> 148178
irb(main):003:0> :bar.to_i
=> 16081
irb(main):004:0> :bar.object_id
=> 160818
irb(main):005:0> :zzzzzzzzzzzzzzzz.to_i
=> 16089
irb(main):006:0> :zzzzzzzzzzzzzzzz.object_id
=> 160898
irb(main):007:0> :puts.to_i
=> 7345
irb(main):008:0> :puts.object_id
=> 73458
irb(main):009:0>

i.e. I don't think the symbol table maintains an explicit integer key
for
each symbol.
Robert K. (Guest)
on 2007-05-15 19:11
(Received via mailing list)
On 15.05.2007 16:34, Robert D. wrote:
>> >> 10MB+1byte of binary dump, and the old 10MB object is
>> What advantages does this have over using "freeze" directly?
>
>
> and my code is broken, while in
>
>
> def blah str
>    raise ArgumentError unless str.respond_to? :frozen && str.frozen?
>   ...
> end
>
> frozen is frozen forever.

Corrent.  And since #frozen? is defined in Kernel you can skip the first
test.

> So do what Robert told you and beware of what Robert told you;)

:-)

>> > You see things; and you say Why?
>> > But I dream things that never were; and I say Why not?
>> > -- George Bernard Shaw
>>
>> Greetings to George, btw. :-)
> Well last time I met him he was admiring your posts to the list :-P

Wow!  So he didn't die but just went home like this other guy who
invented a vi clone (or at least provided his name for the operation)...
:-)

>>         robert
>>
>>
> idem

:-)

While we're at it: *if* you want to define something (and are a fan of
C++) you can do this:

irb(main):001:0> module Kernel
irb(main):002:1> private
irb(main):003:1> def const(*a) a.each {|x| x.freeze } end
irb(main):004:1> end
=> nil
irb(main):005:0> nil
=> nil
irb(main):006:0> foo, bar = const "foo", "bar"
=> ["foo", "bar"]
irb(main):007:0> ["foo", "bar"]
=> ["foo", "bar"]
irb(main):008:0> foo << bar
TypeError: can't modify frozen string
         from (irb):8:in `<<'
         from (irb):8
         from :0
irb(main):009:0> bar << foo
TypeError: can't modify frozen string
         from (irb):9:in `<<'
         from (irb):9
         from :0
irb(main):010:0>

Hihi...

Kind regards

  robert
Gary W. (Guest)
on 2007-05-15 19:24
(Received via mailing list)
On May 15, 2007, at 10:53 AM, Brian C. wrote:

>>> collected.
>> But of course we have immutable strings already :)))
>>
>> class IString < String
>>   def initialize str
>>     super(str)
>>     freeze
>>   end
>> end
>
> Yes, but it's not a singleton.


You've stated or implied a couple of times in this discussion that
symbols are 'singletons', but I thought the conventional definition
of 'singleton' was of a class with only a single instance, where the
instance is called a singleton.  That doesn't describe Ruby's symbols.

I think what you are getting at is the idea that identity and
equality are one and the same for symbols.  Fixnum instances also
have this property but floats don't.  Is there a standard term for
that characteristic?  I think in mathematics it would be an equivalence
relation ~ such that If x ~ y then x = y for all x, y in the set.
In this case ~ represents Ruby's == and = represents Ruby's equal?.
Robert D. (Guest)
on 2007-05-15 19:38
(Received via mailing list)
On 5/15/07, Robert K. <removed_email_address@domain.invalid> wrote:
<snip>
> > frozen is frozen forever.
>
> Corrent.  And since #frozen? is defined in Kernel you can skip the first
> test.

No, you are an optimist Robert ;)

irb(main):003:0> Kernel.send :remove_method, :frozen?
=> Kernel
irb(main):004:0> "a".frozen?
NoMethodError: undefined method `frozen?' for "a":String
        from (irb):4
        from :0

But maybe we should not worry too much about that kind of meta-hackery
in our design, because one could trick as anyway, e.g.

class String; def frozen?; true end end

So you are right after all ;-)

Cheers
Robert
Robert D. (Guest)
on 2007-05-15 19:42
(Received via mailing list)
On 5/15/07, Robert K. <removed_email_address@domain.invalid> wrote:
<snip>
> >> > You see things; and you say Why?
> >> > But I dream things that never were; and I say Why not?
> >> > -- George Bernard Shaw
> >>
> >> Greetings to George, btw. :-)
> > Well last time I met him he was admiring your posts to the list :-P
>
> Wow!  So he didn't die but just went home like this other guy who
> invented a vi clone (or at least provided his name for the operation)... :-)

Your conclusions are jumped ;)
But sure would have liked to talk to this guy. As to Gödel or
Hemingway, well maybe I am OT *now*.
> <snip>
>
> While we're at it: *if* you want to define something (and are a fan of
> C++) you can do this:
>
> irb(main):001:0> module Kernel
> irb(main):002:1> private
> irb(main):003:1> def const(*a) a.each {|x| x.freeze } end
> irb(main):004:1> end

hey that is quite nice!!!
<snip>
Rick D. (Guest)
on 2007-05-15 20:11
(Received via mailing list)
On 5/15/07, Robert D. <removed_email_address@domain.invalid> wrote:
> On 5/15/07, Rick DeNatale <removed_email_address@domain.invalid> wrote:
> <lots of interesting stuff snipped>
> >
> > It seems that the HashWithIndifferentAccess class added by Rails in
> > ActiveSupport, which allows symbols and strings to be used
> > interchangeably as keys,

> Is this whole String vs. Symbol idea motivated by Rails stuff?



> I will rephrase OP's question now, why the h[ae]ck did the Core team
> think about unifying Strings and Symbols in the first place ???

I don't know. Probably not motivated, but on the other hand it no
doubt stimulated a reconsideration of the relationship between String
and Symbol.

Whether or not Strings and Symbols have an inheritance relationship is
a bit of an accidental design choice.  Keeping in mind that in a
language like Ruby or Smalltalk, the class hierarchy is really about
implementation factoring and not type specification, as a first
approximation, it doesn't matter that much.  In Smalltalk-80 Symbol is
a subclass of String, but I believe that Symbol overrode the methods
which mutate the instance to cause errors.

But once the decision was made, secondary effects ensue.  If
programmers write code which depends on a particular inheritance
relationship like the case statement in my earlier post, then changes
to the decision will break things.  It's like the story about how
Stewart Feldman decided to use tab as a lexical element in makefiles
and treat them differently from the equivalent whitespace.  He
realized that this was a bad decision, but too late.
From: http://www.faqs.org/docs/artu/ch15s04.html

  "No discussion of make(1) would be complete without an
   acknowledgement that it includes one of the worst design botches
   in the history of Unix. The use of tab characters as a required
leader
   for command lines associated with a production means that the
   interpretation of a makefile can change drastically on the basis of
invisible
   differences in whitespace.

        Why the tab in column 1? Yacc was new, Lex was brand new. I
hadn't
        tried either, so I figured this would be a good excuse to learn.
After
       getting myself snarled up with my first stab at Lex, I just did
something
       simple with the pattern newline-tab. It worked, it stayed. And
then a
       few weeks later I had a user population of about a dozen, most of
them
       friends, and I didn't want to screw up my embedded base. The
rest,
       sadly, is history.
                                               -- Stuart Feldman

Not that I'm saying that Matz's decision on Symbol not being a
subclass of String was a bad one, I'm not, and it's certainly not in
the class of the tab/whitespace 'decision' in make.  What I am saying
is that once made these decisions can quickly generate their own
requirements to exist once a user base has been established.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Rick D. (Guest)
on 2007-05-15 21:15
(Received via mailing list)
On 5/15/07, Brian C. <removed_email_address@domain.invalid> wrote:

> => 160818
> irb(main):005:0> :zzzzzzzzzzzzzzzz.to_i
> => 16089
> irb(main):006:0> :zzzzzzzzzzzzzzzz.object_id
> => 160898

Here's part of the ruby1.8.5 code which computes an objects object_id
from its reference value.

if (TYPE(obj) == T_SYMBOL) {
        return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
    }

where SYM2ID is a c macro which shifts the value right 8 bits.

And here's the code for Symbol#to_i
static VALUE
sym_to_i(sym)
    VALUE sym;
{
    ID id = SYM2ID(sym);

    return LONG2FIX(id);
}


> i.e. I don't think the symbol table maintains an explicit integer key for
> each symbol.

Actually it does, based on having recently read the ruby 1.8.5 code.

It keeps two internal hashes, one maps the string representation to
the integer representation, and the other maps the other way around.

The code for String#to_sym basically does this:

    it calls rb_intern to get the integer representation called id, and
returns
    ID2SYM(id) which just returns id shifted left 8 bits, in other
words it's the inverse of SYM2ID.

   rb_intern searches for the string in the symbol table and returns
the id found there if it finds it.

  otherwise, it calculates the integer representation by shifting the
next available id left by 3 bits and oring in some flag bits which
depend on the contents of the string, for example if the string starts
with a single "@"  it's flagged as an instance variable name,

It then makes a copy of the string and does the equivalent of
    sym_table[stringcopy] = newly_computed_id
    sym_rev_table[newly_computed_id] = stringcopy

Although these two aren't ruby hash objects but c hash tables.

FWIW, Ruby hash object use the same c hash code internally.

What's interesting is that a reference to a symbol doesn't actually
point to an allocated object.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Brian C. (Guest)
on 2007-05-15 21:40
(Received via mailing list)
On Wed, May 16, 2007 at 12:23:09AM +0900, Gary W. wrote:
> >>>one character to the end of it, then I create another object
> >>end
> equality are one and the same for symbols.
No, that's not exactly what I meant, but sorry for not being more
precise.
What I meant was: there is only ever one symbol object in existence for
a
particular sequence of characters. :foo.object_id in one part of the
program
is always the same as :foo.object_id elsewhere.

If it were Symbol.new("foo") always returning the same object then I
guess
it would probably be called the multiton pattern.

Regards,

Brian.
Sven S. (enduro) (Guest)
on 2007-05-16 13:44
(Received via mailing list)
Hello everybody,

Although not a lot from the Ruby-Core specialists,
but still I have learned a lot from the discussion.
I am trying build a conceptual picture now.



Some say Strings and Symbols are conceptuelly very different
some say they are quite close.

I view it like this:
Symbols essentially are names, Strings essentially are data,
while they both appear as sequences of characters.

Names/Symbols are just atomic, constant, unrelated entities,
while Strings as data have a rich life, they can be related in
many ways they can be analysed, even be modified.

That's a clean distinction and I think it is very well-represented
in the current Ruby implementation.

It this light, it seems nonsensical to make one the subclass of the
other.
(A common superclass would be OK, though.)

Now, in practice, the situation gets more complex:
1. Names sometimes turn into data (option names, method names, table
names...),
    especially when things get highly dynamic.
2. Sometimes, programmers to use the conceptually "wrong" class, maybe
    as a kind of optimization, for the sake of beauty or out of lazyness
... :-)

One could argue that it is good that Symbol and String are
well-separated,
because it educates programmers to decide for the "correct" class to
use.



On the other hand, the following situation occurs very very often:
You need to transfer a sequence of characters  -- which format do you
use
always Symbols, always Strings, should it allow both? (Or even a fancy
object)

First, you could argue that when you use duck-typing, the interface can
be kept open.
But still, many situations remain, where this question is remains.

This choice can be a burden, especially if you think of
inter-operability or optimisation.

And that is an argument for some sort of unification of Symbol and
String.

Subclassing alone would not be enough, to solve the problem above,
also, String#== and Symbol#== would have to be defined such that  "a" ==
:a
And also #hash would have to be defined accordingly.

Then you would still have the two different kinds of objects ("a" and
:a)
but they would behave quite the same  except for modifying methods.



Now, as I am writing this, I doubt that the advantages
of the unification are really worth doing it...

It depends on factors not known to me.

But now, I think I can understand the core-team's decision better.


Bye
Sven


Brian C. schrieb:

>>of 'singleton' was of a class with only a single instance, where the
>
>If it were Symbol.new("foo") always returning the same object then I guess
>it would probably be called the multiton pattern.
>
>
>
Isn't the term "immediate value" used for that? Like:
   :abc is an immediate value, and so is 12, so is nil
   "abc" is a reference value und so is [1, 2] and also {} and even 12.0
Trans (Guest)
on 2007-05-16 15:06
(Received via mailing list)
On May 16, 5:44 am, "Sven S. (enduro)" <removed_email_address@domain.invalid> 
wrote:
> Subclassing alone would not be enough, to solve the problem above,
> also, String#== and Symbol#== would have to be defined such that  "a" == :a
> And also #hash would have to be defined accordingly.
>
> Then you would still have the two different kinds of objects ("a" and :a)
> but they would behave quite the same  except for modifying methods.

While I think Symbol probably could use at least few of String's
manipulation methods, putting that aside, I wonder how it would effect
things just to make :a == "a" ?

> Now, as I am writing this, I doubt that the advantages
> of the unification are really worth doing it...
>
> It depends on factors not known to me.
>
> But now, I think I can understand the core-team's decision better.

Thanks for this excellent summary.

T.
Logan C. (Guest)
on 2007-05-16 16:46
(Received via mailing list)
On 5/16/07, Trans <removed_email_address@domain.invalid> wrote:
> While I think Symbol probably could use at least few of String's
> manipulation methods, putting that aside, I wonder how it would effect
> things just to make :a == "a" ?
>
Well there is precendent, 2 == 2.0 and so on
On the other hand, what should happen in case statements? Maybe it
would acutally be better to make :a === 'a' but not :a == 'a'
unknown (Guest)
on 2007-05-16 18:06
(Received via mailing list)
Hi --

On Wed, 16 May 2007, Logan C. wrote:

>> > but they would behave quite the same  except for modifying methods.
>>
>> While I think Symbol probably could use at least few of String's
>> manipulation methods, putting that aside, I wonder how it would effect
>> things just to make :a == "a" ?
>>
> Well there is precendent, 2 == 2.0 and so on

With symbols being as integer-like as they are string-like, though,
it's really equally similar to:

   2 == :"2"

> On the other hand, what should happen in case statements? Maybe it
> would acutally be better to make :a === 'a' but not :a == 'a'

I guess as long as :a === :a was still true, that might be a good way
to express the fact that "this is the string of which this symbol is a
case", or something like that.


David
Logan C. (Guest)
on 2007-05-16 22:19
(Received via mailing list)
On 5/16/07, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> >> :a
>
> With symbols being as integer-like as they are string-like, though,
> it's really equally similar to:
>
>    2 == :"2"
>
I don't think symbols are integer like. (I don't know that they are
especially string like either), but I'd be willing to bet a lot more
code in the wild would be broken if you removed Symbol#to_s vs.
removing Symbol#to_i.

Your example really ought to be

2 == :whatever_symbol_whose_to_i_results_in_2
Gary W. (Guest)
on 2007-05-17 22:11
(Received via mailing list)
On May 16, 2007, at 11:17 AM, Logan C. wrote:
> On 5/16/07, removed_email_address@domain.invalid <removed_email_address@domain.invalid> 
wrote:
>> With symbols being as integer-like as they are string-like, though,
>> it's really equally similar to:
>>
>>    2 == :"2"
>>
> I don't think symbols are integer like.

This is the 'equivalence is defined by identity' idea again.  I think
this is what David means by 'integer-like'.  It is this property that
both fixnums and symbols share but that is *not* shared by strings.

Making '==' work with mixed operands of symbols and strings breaks that
idea and leads to the strange example that David gave (2 == :"2").

Gary W.
unknown (Guest)
on 2007-05-17 22:18
(Received via mailing list)
Hi --

On Fri, 18 May 2007, Gary W. wrote:

> this is what David means by 'integer-like'.  It is this property that
> both fixnums and symbols share but that is *not* shared by strings.

Yes, it's the immutable/immediate thing that symbols have in common
with fixnums and that neither has in common with strings.


David
Paul B. (Guest)
on 2007-05-18 17:53
(Received via mailing list)
On Fri, May 18, 2007 at 03:17:01AM +0900, removed_email_address@domain.invalid 
wrote:
> Yes, it's the immutable/immediate thing that symbols have in common
> with fixnums and that neither has in common with strings.

Frozen strings are immutable.

Paul
Rick D. (Guest)
on 2007-05-18 23:40
(Received via mailing list)
On 5/18/07, Paul B. <removed_email_address@domain.invalid> wrote:
> On Fri, May 18, 2007 at 03:17:01AM +0900, removed_email_address@domain.invalid wrote:
> > Yes, it's the immutable/immediate thing that symbols have in common
> > with fixnums and that neither has in common with strings.
>
> Frozen strings are immutable.

But not immediate.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Robert D. (Guest)
on 2007-05-18 23:52
(Received via mailing list)
On 5/18/07, Rick DeNatale <removed_email_address@domain.invalid> wrote:
> On 5/18/07, Paul B. <removed_email_address@domain.invalid> wrote:
> > On Fri, May 18, 2007 at 03:17:01AM +0900, removed_email_address@domain.invalid wrote:
> > > Yes, it's the immutable/immediate thing that symbols have in common
> > > with fixnums and that neither has in common with strings.
> >
> > Frozen strings are immutable.
>
> But not immediate.
>

What about

%f{This is sooo cooooold} << "!"

TypeError: can't modify frozen string
Just an idea.


Robert
Rick D. (Guest)
on 2007-05-19 03:29
(Received via mailing list)
On 5/18/07, Robert D. <removed_email_address@domain.invalid> wrote:
>
> What about
>
> %f{This is sooo cooooold} << "!"
>
> TypeError: can't modify frozen string
> Just an idea.

That's the immutable part, but

a = "abc".freeze
b = "abc".freeze
c = :abc
d = :abc
a.object_id => -606341628
b.object_id => -606347008
c.object_id => 343218
d.object_id => 343218

The key difference is that there's only one instance of a symbol with
a given string representation.

The shorthand way of saying this is that symbols, like fixnums are
immediate. Which is a sufficent but not necessary condition, it
crosses the line a bit in describing both the identity relationship
requirement AND the implementation.

Most normal objects are referenced at the C level by an internal value
which is a pointer to the objects state representation in memory.
Since objects are aligned at least on a word boudary, all normal
object pointers will have the 2 least significant bits as zero. They
will also be non-zero

A few objects are immediate which means that they are referenced at
the C level by a representation whose value is not a pointer.  Fixnums
are represented by shifting the C representation left one bit and
turning on the low-order bit.  False is represented by 0, True by 2,
and Nil by 4.

Ruby symbols are represented by a value computed by shifting the
symbols integer representation left 8 bits and setting the low-order
byte to 0xFF representation

As I said, it's not essential that symbols be immediate, for example
interning a string could create a Symbol instance which was frozen and
registered in a global symbol table, i.e. the multiton pattern, but
the current implementation no doubt has some advantages in either
low-level mechanism performance, supporting some niche in ruby legacy,
or both.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/
Robert D. (Guest)
on 2007-05-19 09:50
(Received via mailing list)
On 5/18/07, Rick DeNatale <removed_email_address@domain.invalid> wrote:
> > >
> a = "abc".freeze
> b = "abc".freeze
> c = :abc
> d = :abc
> a.object_id => -606341628
> b.object_id => -606347008
> c.object_id => 343218
> d.object_id => 343218
>
> The key difference is that there's only one instance of a symbol with
> a given string representation.
Ah I see, I got confused, I did not understand the meaning of
immediate immediately ;).
Although theoretically the interpreter could create an immediate value
for
%f{...} we would probably run out of address space :(
<snip>
>
Cheers
Robert
Brian C. (Guest)
on 2007-05-19 16:56
(Received via mailing list)
On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:
> and Nil by 4.
>
> Ruby symbols are represented by a value computed by shifting the
> symbols integer representation left 8 bits and setting the low-order
> byte to 0xFF representation

Perhaps it varies based on the Ruby version you're running; it's not
like
that for me.

irb(main):006:0> :foo.object_id.to_s(16)
=> "39490e"
irb(main):007:0> RUBY_VERSION
=> "1.8.4"

I think a weaker requirement than 'immediate' is needed. A symbol can
quite
happily be a regular object; we just need to ensure that there is always
only one symbol for a particular symbol character sequence.

Regards,

Brian.
Rick D. (Guest)
on 2007-05-20 05:18
(Received via mailing list)
On 5/19/07, Brian C. <removed_email_address@domain.invalid> wrote:
> On Sat, May 19, 2007 at 08:28:24AM +0900, Rick DeNatale wrote:

> => "1.8.4"
You can't really see the internal bit representations from ruby, since
they get manipulated before you see them.  Much like the class of an
object reported by ruby isn't the same as the object pointed to by its
klass pointer at the C level.

And even if you could, I was talking about the integer representation
of the symbol, not the object_id.

Not to say that this doesn't change between versions of ruby.  Which
is why it's carefully hidden from ruby code.

> I think a weaker requirement than 'immediate' is needed. A symbol can quite
> happily be a regular object; we just need to ensure that there is always
> only one symbol for a particular symbol character sequence.

Yes, I said that, but the key issue for the subject of the current
thread is that Symbols aren't strings, they might have both a string
representation and an integer representation, but then so do integers,
and unlike Strings they have an essential requirement that equality
implies identity which is an accidental property of integers in the
range of Fixnum.


--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Brian C. (Guest)
on 2007-05-20 16:35
(Received via mailing list)
On Sun, May 20, 2007 at 10:18:22AM +0900, Rick DeNatale wrote:
> >irb(main):006:0> :foo.object_id.to_s(16)
> of the symbol, not the object_id.
AFAIK, the object_id is the in-memory pointer to the structure of the
object
(if it's a material object), or is one of the special values:

- 0, 2 or 4 for false, true or nil

- (n<<1) | 1 for Fixnums

None of these is valid as a pointer to a memory location, so they can be
recognised immediately as special.

So in the above, :foo's object ID looks like a memory pointer to me. It
might not be, but then you'd need to guarantee that 39490e could not
possibly be a valid memory pointer for some regular object (and also be
able
to recognise this by inspection, i.e. by looking at the bit pattern)

Regards,

Brian.
Rick D. (Guest)
on 2007-05-21 02:55
(Received via mailing list)
On 5/20/07, Brian C. <removed_email_address@domain.invalid> wrote:
> > >
> > And even if you could, I was talking about the integer representation
> > of the symbol, not the object_id.
>
> AFAIK, the object_id is the in-memory pointer to the structure of the object
> (if it's a material object), or is one of the special values:
>
> - 0, 2 or 4 for false, true or nil
>
> - (n<<1) | 1 for Fixnums

Not starting with 1.8.5
VALUE
rb_obj_id(VALUE obj)
{
    /*
     *                32-bit VALUE space
     *          MSB ------------------------ LSB
     *  false   00000000000000000000000000000000
     *  true    00000000000000000000000000000010
     *  nil     00000000000000000000000000000100
     *  undef   00000000000000000000000000000110
     *  symbol  ssssssssssssssssssssssss00001110
     *  object  oooooooooooooooooooooooooooooo00        = 0 (mod
sizeof(RVALUE))
     *  fixnum  fffffffffffffffffffffffffffffff1
     *
     *                    object_id space
     *                                       LSB
     *  false   00000000000000000000000000000000
     *  true    00000000000000000000000000000010
     *  nil     00000000000000000000000000000100
     *  undef   00000000000000000000000000000110
     *  symbol   000SSSSSSSSSSSSSSSSSSSSSSSSSSS0        S...S % A = 4
(S...S = s...s * A + 4)
     *  object   oooooooooooooooooooooooooooooo0        o...o % A = 0
     *  fixnum  fffffffffffffffffffffffffffffff1        bignum if
required
     *
     *  where A = sizeof(RVALUE)/4
     *
     *  sizeof(RVALUE) is
     *  20 if 32-bit, double is 4-byte aligned
     *  24 if 32-bit, double is 8-byte aligned
     *  40 if 64-bit
     */
    if (TYPE(obj) == T_SYMBOL) {
        return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
    }
    if (SPECIAL_CONST_P(obj)) {
        return LONG2NUM((long)obj);
    }
    return (VALUE)((long)obj|FIXNUM_FLAG);
}

1.8.6 and 1.9 have the same code.


--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Robert D. (Guest)
on 2007-09-26 01:04
(Received via mailing list)
On 5/16/07, Logan C. <removed_email_address@domain.invalid> wrote:
> >
> > While I think Symbol probably could use at least few of String's
> > manipulation methods, putting that aside, I wonder how it would effect
> > things just to make :a == "a" ?
> >
> Well there is precendent, 2 == 2.0 and so on
> On the other hand, what should happen in case statements? Maybe it
> would acutally be better to make :a === 'a' but not :a == 'a'
>
>
Honestly I prefer to write

case s.to_s
   when 'a'

instead of
case s
     when 'a'

but the most explicit way to do this is maybe the most readable

case s
    when :a, 'a'

Cheers
Robert

P.S.
Tom is right that was an excellent
resumé.R
This topic is locked and can not be replied to.