Regexp, String, Symbol literals' object_ids

Regexp literals:
5.times { p /abcdasdf/.object_id } -> same!

String literals:
5.times { ‘asdasdf’.object_id } -> different

Symbols:
5.times { p :asdfsf.object_id } -> same!

Symbols with to_s:
5.times { p :asdsdfsdf.to_s.object_id } -> different

Predefined string as a constant
CONS = ‘asdfsdf’
5.times { p CONS.object_id } -> same! (sure)

Question:
Is there some special syntax for string literals (“asdfasdf”) to behave
like /sadfsdf/ as in the examples above? Without predefining a string as
a constant’s value. Or another elegant way to achieve the same goal?

Ruby has both mutable and immutable strings. A mutable string is
declared as “string”. An immutable string is declared as :string and in
ruby is called a ‘symbol’. So, no, there is no way for “string” to
behave as :string, since that’s by design. Well there is a way but I’d
not go there :slight_smile:

If you want two equivalent string literals to point at the same
instance, use the symbol notation, as in:

:test.object_id == :test.object_id #true


Andrea D.

Andrea

Il 19/12/2010 21:07, Pavel R. ha scritto:

Andrea D. wrote in post #969438:

Well there is a way but I’d
not go there :slight_smile:

Digging into parse.y and other ruby core files?

Am 19.12.2010 21:07, schrieb Pavel R.:

Regexp literals:
5.times { p /abcdasdf/.object_id } -> same!

How is this possible? For every time the loop is executed there should a
new regexp be created… Have a look at this which seems confusing to
me:

#ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
irb(main):001:0> 5.times { p /abcdasdf/.object_id }
8030280
8030280
8030280
8030280
8030280
=> 5
irb(main):002:0> p /abcdasdf/.object_id
8049600
=> 8049600
irb(main):003:0> p /abcdasdf/.object_id
8063560
=> 8063560
irb(main):004:0>

The regexp in the loop always stays the same, but if I create some
others outside the loop, they get a different object ID? Can anybody
shade some light on this?

Valete,
Marvin

How about this? I have just discovered

pavel@pavel-laptop:~/dev/binexp$ ~/usr/ruby19/bin/irb
irb(main):001:0> /(?\d+)/ =~ ‘abc123def’
=> 3
irb(main):002:0> digits
=> “123”
irb(main):003:0>

According to ri Regexp#=~

If =~ is used with a regexp literal with named captures, captured
strings (or nil) is assigned to local variables named by the capture
names.

/(?\w+)\s*=\s*(?\w+)/ =~ " x = y "
p lhs #=> “x”
p rhs #=> “y”

If it is not matched, nil is assigned for the variables.

/(?\w+)\s*=\s*(?\w+)/ =~ " x = "
p lhs #=> nil
p rhs #=> nil

This assignment is implemented in the Ruby parser. The parser detects
‘regexp-literal =~ expression’ for the assignment. The regexp must be a
literal without interpolation and placed at left hand side.

=======>

It seems Ruby parser can do some magic things!

Try

ruby-1.9.2-head > 5.times { p /thesame/.object_id.to_s + ’ ’ +
/thesame/.object_id.to_s}
“21522620 21522400”
“21522620 21522400”
“21522620 21522400”
“21522620 21522400”
“21522620 21522400”
=> 5
ruby-1.9.2-head > 5.times { p ‘thesame’.object_id.to_s + ’ ’ +
‘thesame’.object_id.to_s}
“21553480 21553400”
“21553320 21553240”
“21553160 21553080”
“21553000 21552920”
“21552840 21552760”
=> 5
ruby-1.9.2-head >

What is the logic behind this?

On Sun, Dec 19, 2010 at 8:01 PM, Abinoam Jr. [email protected] wrote:

ruby-1.9.2-head > 5.times { p ‘thesame’.object_id.to_s + ’ ’ +

A regular expression literal like /thesameobject/ causes a Regexp
object to be instantiated at PARSE time.

Since regular expressions are immutable (as contrasted with strings or
arrays which also have a literal representation), it really doesn’t
mattter if a new regeexp is created each time the expression
containing the literal is evaluated. The fact that two occurrences of
an apparently ‘equal’ regular expression generate two different
instances simply reflects the fact that the parser doesn’t attempt to
consolidate equal literals.

In the case of a string literal, since strings are mutable, a new
string instance is created each time the expression containing the
string literal is evaluated.


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Github: rubyredrick (Rick DeNatale) · GitHub
Twitter: @RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Andrea D. wrote in post #969438:

Ruby has both mutable and immutable strings. A mutable string is
declared as “string”. An immutable string is declared as :string and in
ruby is called a ‘symbol’. So, no, there is no way for “string” to
behave as :string, since that’s by design.

This is a very misleading description, so I’ll bite.

Strings and Symbols are two completely different things in Ruby. In Ruby
1.9, Symbols have gained some more string-like behaviour(*), but they’re
still fundamentally different.

Symbols are objects intended for labelling things (e.g. method names,
hash keys). The main property of Symbols is that there only ever exists
one Symbol object which represents the same label, i.e. the same
sequence of characters.(**)

So when your program loads, and it uses the symbol :foo, which hasn’t
been used before, then a new symbol called :foo is created in the symbol
table. But every other future use of :foo always returns the same
object.

This makes symbols very cheap to test for equality, because:

  • Two symbols are the same iff they have the same object_id
  • Two symbols are different iff they have different object_id

So testing equality between :a_very_long_symbol_like_this and
:another_very_long_symbol is only comparing their object_ids, basically
two integers.

The property that any future :foo must return the same object_id means
that the Symbol table is never garbage-collected. A Symbol is for life,
not just for Christmas.

Strings are collections of bytes/characters. They can be mutated. There
can be many String objects in the system which contain the same sequence
of bytes/characters. Therefore, comparing two Strings always has to be
done byte-by-byte.(***)

In general, what you want is a String. If you’re reading data from a
user (e.g. on STDIN or a web-page POST) then it comes in as a String.
You can convert a String into a Symbol represented by the same set of
characters:

a = “foo”
b = a.intern # b = :foo

but this can be a dangerous thing to do if the string you are converting
came from an untrusted source, because it can lead to a simple
denial-of-service attack as the user floods your symbol table with
garbage.

So to summarise, Symbols are used as method names:

a = 1
b = a.send(:+, 2) # b = a + 2

and are often used as hash keys, because the lookup operations are
cheaper.

def doit(params)
puts params[:foo]
puts params[:bar]
end

doit(:foo=>123, :bar=>456)

If coming from a language like C, think of symbols more as enums rather
than strings, where the programmer is using an easy-to-read label like
:foo, but the underlying value is actually a number.

HTH,

Brian.

(*) Example from ruby 1.8:

:foo.size
NoMethodError: undefined method `size’ for :foo:Symbol
from (irb):2
from :0

But:

1.9.2-p0 > :foo.size
=> 3

(**) Everything you say about Strings or Symbols in 1.9 has to be
qualified, because it’s such a complex area. Suffice to say, in 1.9 it’s
possible to have two distinct Symbols which are labelled by the same
series of bytes but with different encodings.

Things are far simpler in ruby 1.8, where bytes are real bytes, and
small furry creatures from Alpha Centuri are real small furry creates
from Alpha Centuri.

(***) There are in fact some optimisations whereby two distinct string
objects can share the same underlying data buffer, with copy-on-write.
But in general comparing strings needs to compare the buffers.

And even though ruby 1.9 has strings of characters, the comparisons
are done byte-by-byte, not character by character.

Ok. But initial question was slightly different.

Can I write something like

%c(string i do not want to be created again and again, and i do not want
to define it as a constant because it is used once in a code)

?

What is a reason if it is impossible in Ruby? It seems to be useful!

[Advice… I’m new at this… be patient]

Isn’t it “%q” ?

I couldn’t figure out a “need” of this that wouldn’t fit something
like…

holding_var = %q(string i do not want to be created again and again,
and i do not want to define it as a constant because it is used once
in a code)

5.times { puts “holding_var = #{holding_var} and its object_id is
#{holding_var.object_id}” }

If the string is short, you could even use a var name that ressembles
the string.

created_once = “created_once”

Could you show an example? (“used once in a code” vs. “created again and
again”)

Abinoam Jr.

Hi Pavel,

I was trying to learn about “const_missing” and I digged a little more
down your problem.

Sending the code attached.

It compares unpack using as arguments:

  1. String (like the code you sent bellow)
  2. predefined constants referencing strings (you said that it would
    pollute your code)
  3. undefined constants that are then defined by const_missing (doesn’t
    handle special characters like ("*").

The use of constants seems to be a little faster for the reasons you
said before (no instantiating a new string each iteration).
What kind of “problem” are going to need do so many packs/unpacks that
this part (constant or string) will make much difference?
In my PC it had to repeat that 2 unpacks almost 1 million of times to
last 1 second.

For improving the code we should handle the “" character. We could
use a char that is not reserved by pack. Like “o”, for example, so “o”
would be substituted by "
” in the generated string.

I’m new at ruby (but studying hard). Sorry for any mistake.
And, feedback (by anyone) about the code would be appreciated.

Abinoam Jr.

What kind of “problem” are going to need do so many packs/unpacks that
this part (constant or string) will make much difference?

Working with binary protocols. Smth. like
https://github.com/pavelrosputko/em-oscar/blob/master/em-oscar/icbm.rb

Much difference? Actually not so much in icbm.rb source above.

But one wrote at http://redmine.ruby-lang.org/issues/show/4184#note-3

I’ve been able to get 2-3% improvements in Rails apps by simply
rewriting some 'constant’s and inline Arrays as CONSTANTs.

I have patches to MRI that use cached, immutable Strings for the
internal #to_s messages on immutable objects; e.g. changing Symbol#to_s,
Float#to_s, Bignum#to_s, Rational#to_s, etc. to return the same frozen
String instance. I measured 1-6% performance improvement in the
standard MRI tests.

On Wed, Dec 22, 2010 at 10:56 AM, Pavel R. [email protected]
wrote:

I’ve been able to get 2-3% improvements in Rails apps by simply
rewriting some 'constant’s and inline Arrays as CONSTANTs.

2-3% of what? If it’s 200ms, the gains are much less impressive when
compared to 2000ms.


Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Example:

class A
def m
a,b,c,d = data.unpack(‘NNvv’)
e,f,g,h = a.unpack(‘vNNv’)
# and so on …
# do something with data
end
end

To prevent ‘NNvv’ again and again I can assign a constant

class A
Format_NNvv = ‘NNvv’
end

and use it.

But there’s many different ‘NNvv’, ‘vNNv’, ‘a*c’, … in my code so I
need to assign all them to constants. This approach implies a large
section of assigning constants.