Symbols garbage collector in Ruby1.9, fixed?

ISSSSaki_Baz_C · March 30, 2009, 10:10am

Hi, in Ruby 1.8 there is an issue when adding more and more Symbols
since they remain in memory and are never removed.

I’m doing a server in Ruby that receives messages with headers (From,
To, Subject, X-Custom-Header-1…) and after parsing I store the
headers in a hash using symbols as keys:

headers = {
:from => “[email protected]”,
:to => “[email protected]”,
:“x-custom-header-1” => “Hi there”
}

I could use strings as keys instead of symbols, but I’ve checked that
getting a Hash entry is ~25% faster using Symbols.

The problem is that I could receive custom headers so for each one a
new Symbol would be created. An attacker could send lots of custom
headers to fill the server memory and cause a denial of service.

Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

ISSSSaki_Baz_C · March 30, 2009, 10:17am

2009/3/30 IÃ±aki Baz C. [email protected]:

Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

Is there any way to check if a Symbol already exist before creating it?

ISSSSaki_Baz_C · March 30, 2009, 11:05am

Le 30 mars 2009 à 10:09, Iñaki Baz C. a écrit :

The problem is that I could receive custom headers so for each one a
new Symbol would be created. An attacker could send lots of custom
headers to fill the server memory and cause a denial of service.

Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

It depends on what exactly you are trying to do with your hash. If you
need to access to a few well known headers in your code, use symbols for
those and add another pseudo-header for the rest of the info :

USEFUL_HEADERS = [ :from, :to, :“x-mailer” ]

headers = {
:from => “[email protected]”,
:to => “[email protected]”,
:“x-mailer” => “Pegasus Mail for Windows (4.50 PB1)”,
:"_custom" => {
“x-custom-header-1” => “Hi there”,
“x-spam-scanned” => “Of course”
}
}

(Now, you’ll lose time at the parse step. Again, depending on what
you’re trying to do, it may be efficient if each mail is parsed one time
and, then, each header is accessed a lot of times.)

Fred

ISSSSaki_Baz_C · March 30, 2009, 11:18am

2009/3/30 F. Senault [email protected]:

Â :“x-mailer” => “Pegasus Mail for Windows (4.50 PB1)”,
Â :“_custom” => {
Â Â “x-custom-header-1” => “Hi there”,
Â Â “x-spam-scanned” => “Of course”
Â }
}

(Now, you’ll lose time at the parse step. Â Again, depending on what
you’re trying to do, it may be efficient if each mail is parsed one time
and, then, each header is accessed a lot of times.)

Thanks, but I prefer to store all the headers in a transparent way so
accessing to a core and well known header is the same as accesing to a
custom and never seen header:
headers[:from]
header[:“x-custom-headers”]

This is, in the transport/parsing layer I cannot know which headers
will be important or not in the “application” layer.

A way to check if a Symbol already exist would be enought for me, but
it doesn’t work:
To know all the current Symbols I can inspect Symbol.all_symbols, but
if I want to check a Symbol:
Symbol.all_symbols.include?(:new_symbol)
this will always return true since :new_symbol is automatically added
XDDD

Thanks.

ISSSSaki_Baz_C · March 30, 2009, 11:38am

Le 30 mars 2009 à 11:17, Iñaki Baz C. a écrit :

A way to check if a Symbol already exist would be enought for me, but
it doesn’t work:
To know all the current Symbols I can inspect Symbol.all_symbols, but
if I want to check a Symbol:
Symbol.all_symbols.include?(:new_symbol)

Symbol.all_symbols.find { |s| s.to_s == “string” }

But, now, you’re creating strings instead…

Fred

ISSSSaki_Baz_C · March 30, 2009, 11:39am

2009/3/30 Bill K. [email protected]:

potential_new_symbol = “xyzzy”
Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol

Thanks but it is too slow:

Benchmark.realtime{ Symbol.all_symbols.map {|s| s.to_s}.include? “qwe” }
=> 0.00371980667114258

I cannot do this test for each header in each received message.

Thanks.

ISSSSaki_Baz_C · March 30, 2009, 11:32am

From: “IÃ±aki Baz C.” [email protected]

A way to check if a Symbol already exist would be enought for me, but
it doesn’t work:
To know all the current Symbols I can inspect Symbol.all_symbols, but
if I want to check a Symbol:
Symbol.all_symbols.include?(:new_symbol)
this will always return true since :new_symbol is automatically added XDDD

potential_new_symbol = “xyzzy”
Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol

?

Regards,

Bil

ISSSSaki_Baz_C · March 30, 2009, 11:55am

From: “IÃ±aki Baz C.” [email protected]

XDDD

potential_new_symbol = “xyzzy”
Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol

Thanks but it is too slow:

Benchmark.realtime{ Symbol.all_symbols.map {|s| s.to_s}.include? “qwe” }
=> 0.00371980667114258

I cannot do this test for each header in each received message.

I assumed you had a plan for that.

We could cache them as a hash, for rapid lookup:

@known_symbols = Hash[ *Symbol.all_symbols.map {|s|
[s.to_s,true]}.flatten ]

Later…

@known_symbols.include? “xyzzy”

Regards,

Bill

ISSSSaki_Baz_C · March 30, 2009, 12:02pm

2009/3/30 Bill K. [email protected]:

Â @known_symbols = Hash[ *Symbol.all_symbols.map {|s| [s.to_s,true]}.flatten
]

Later…

Â @known_symbols.include? “xyzzy”

That sounds interesting, I’ll try it.

Thanks

ISSSSaki_Baz_C · March 30, 2009, 1:57pm

On Mon, Mar 30, 2009 at 4:09 AM, IÃ±aki Baz C. [email protected]
wrote:

:“x-custom-header-1” => “Hi there”
}

I could use strings as keys instead of symbols, but I’ve checked that
getting a Hash entry is ~25% faster using Symbols.

The problem is that I could receive custom headers so for each one a
new Symbol would be created. An attacker could send lots of custom
headers to fill the server memory and cause a denial of service.

Which is why Rails (actually activesupport) which implements a
HashWithIndifferentAccess to allows using strings and symbols
equivalently
for hash access, uses the string form in the actual hash forgoing the
access
performance in favor of safety.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

ISSSSaki_Baz_C · March 30, 2009, 2:41pm

2009/3/30 Brian C. [email protected]:

It’s not “solved” in 1.9, because this is intentional and necessary
problems if your symbols are generated dynamically in response to user
few applications have a specific acceptance criteria for CPU utilisation
or response time. If your application does have a specific performance
criterion that you must meet, then it might be better to consider a
different language, rather than mis-using what Ruby offers. Or including
all things like development costs, it may be more cost-effective to
choose faster hardware to meet the performance goal.

Ok, thanks for your explanation.

ISSSSaki_Baz_C · March 30, 2009, 2:48pm

On Mar 30, 2009, at 7:16 AM, Brian C. wrote:

It’s not “solved” in 1.9, because this is intentional and necessary
behaviour.

Dave T. seems to have thought this was going away:

http://pragdave.blogs.pragprog.com/pragdave/2008/05/ruby-symbols-in.html

James Edward G. II

ISSSSaki_Baz_C · March 30, 2009, 2:16pm

IÃ±aki Baz C. wrote:

I could use strings as keys instead of symbols, but I’ve checked that
getting a Hash entry is ~25% faster using Symbols.

The problem is that I could receive custom headers so for each one a
new Symbol would be created. An attacker could send lots of custom
headers to fill the server memory and cause a denial of service.

Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

It’s not “solved” in 1.9, because this is intentional and necessary
behaviour.

The important property of a symbol is that it has the same id wherever
and whenever it is used in your program, and hence it can never be
garbage-collected. This is so that it can be used for looking up method
names - foo.bar is a shortcut for foo.send(:bar)

Using symbols for hash keys is a common idiom, but arguably is abuse of
the symbol table. It’s fine as long as all the keys are fixed symbol
constants in your program, but as you’ve observed, it causes huge
problems if your symbols are generated dynamically in response to user
data (especially from untrusted or potentially malicious sources)

The solution: use strings as keys, and beware premature optimisation.
Whilst you may have measured that “getting a Hash entry is 25% faster
using Symbols”, does this really make your whole application 25% faster?
I suspect not. Maybe it makes your whole application 0.25% faster. Maybe
it makes your application slower, as each incoming String has to be
converted into a Symbol.

In any case, although we all want things to go “as fast as possible”,
few applications have a specific acceptance criteria for CPU utilisation
or response time. If your application does have a specific performance
criterion that you must meet, then it might be better to consider a
different language, rather than mis-using what Ruby offers. Or including
all things like development costs, it may be more cost-effective to
choose faster hardware to meet the performance goal.

Regards,

Brian.

ISSSSaki_Baz_C · March 30, 2009, 4:01pm

On Monday 30 March 2009 07:48:16 James G. wrote:

Dave T. seems to have thought this was going away:

http://pragdave.blogs.pragprog.com/pragdave/2008/05/ruby-symbols-in.html

That article looks like pure speculation.

Alright, yes, #to_i and #id2name and similar are gone. That makes sense

encapsulate things the average user really doesn’t need. Theoretically,
these
could allow Symbols to be implemented in the heap, if needed. Or it
would
allow them to be implemented in some way that looks nothing like the
current
concept of an integer.

However, the purpose of symbols, I would think, remains the same.

And given the purpose of symbols, and the dynamic nature of Ruby (it has
eval!), there’s really no way you could ever garbage collect symbols.

You could implement symbols as immutable strings on the heap, and do
string
comparisons between them, but that would defeat the purpose of symbols,
at
least in every program I’ve ever wrote – to avoid string comparisons,
and to
be generally much faster than strings.

And for that matter, if you really, really want to be digging around at
that
low level, you still can:

irb(main):001:0> :foo.object_id
=> 351848
irb(main):002:0> ObjectSpace._id2ref 351848
=> :foo

ISSSSaki_Baz_C · March 30, 2009, 5:05pm

Hi,

In message “Re: Symbols garbage collector in Ruby1.9, fixed?”
on Mon, 30 Mar 2009 21:16:00 +0900, Brian C.
[email protected] writes:

|It’s not “solved” in 1.9, because this is intentional and necessary
|behaviour.

Garbage collection for Symbols is planned, but not implemented yet.
It’s not an easy task.

          matz.

ISSSSaki_Baz_C · April 1, 2009, 3:39pm

Is the garbage collection in 1.9 better than 1.8?

Blog: http://random8.zenunit.com/
Learn rails: http://sensei.zenunit.com/

On 31/03/2009, at 2:05 AM, Yukihiro M. [email protected]

ISSSSaki_Baz_C · March 30, 2009, 5:07pm

On Mon, Mar 30, 2009 at 8:48 AM, James G. [email protected]
wrote:

Unfortunately, symbols are still not garbage-collected at the moment.
It seems that the only difference comparing to 1.8 is that symbols are
backed by real frozen string objects instead of arrays of chars.

ISSSSaki_Baz_C · April 2, 2009, 1:59am

On Mon, Mar 30, 2009 at 2:09 AM, IÃ±aki Baz C. [email protected]
wrote:

:“x-custom-header-1” => “Hi there”
}

I could use strings as keys instead of symbols, but I’ve checked that
getting a Hash entry is ~25% faster using Symbols.

Use symbols… FOR SPEED! Unfortunately that speed comes at a price…
you
really want to globally internalize arbitrary input? Symbols are
effectively a freeform enumeration… the reason you’re running into
problems is because you’re trying to enumerate arbitrary inputs.

Is this really an important bottleneck in your application? If not, use
strings and move on.

ISSSSaki_Baz_C · April 2, 2009, 1:12am

Hi,

In message “Re: Symbols garbage collector in Ruby1.9, fixed?”
on Wed, 1 Apr 2009 22:37:46 +0900, Julian L.
[email protected] writes:

|Is the garbage collection in 1.9 better than 1.8?

Yes, but slightly. For example, it returns unused memory regions to
the OS more often than 1.8.

          matz.

ISSSSaki_Baz_C · April 2, 2009, 2:03am

El Jueves 02 Abril 2009, Tony A. escribiÃ³:

Use symbols… FOR SPEED! Unfortunately that speed comes at a price… you
really want to globally internalize arbitrary input? Symbols are
effectively a freeform enumeration… the reason you’re running into
problems is because you’re trying to enumerate arbitrary inputs.

Yes. It’s a parser so custom headers could arrive. I want to store them
in a
hash like:

headers = { :from => “alice@qweeq”, ":to => "bob@qweqwe }

So after parsing the message I create these entries. The problem is that
any
custom header would create a Symbol.

Is this really an important bottleneck in your application?

I think it’s important since after parsing hte main task of the server
will be
accessing some headers to read their content. But since it’s just in a
very
early stage I cannot sure it.

Thanks.