Symbol deserialization from external sources is now known to be
problematic. Is it also quite useful.
Here's what I want, kinda:
class Symbol
def self.defined?(string)
all_symbols.any?{|sym| sym.to_s == string}
end
end
Unfortunately, this is REALLY going to be slow. How about native?
on 2013-02-06 02:27
on 2013-02-06 03:11
On 6 February 2013 11:28, Student Jr <lists@ruby-forum.com> wrote: > > Unfortunately, this is REALLY going to be slow. How about native? > I might be missing something, but how is `string.to_sym` problematic? -- Matthew Kerwin, B.Sc (CompSci) (Hons) http://matthew.kerwin.net.au/ ABN: 59-013-727-651 "You'll never find a programming language that frees you from the burden of clarifying your ideas." - xkcd
on 2013-02-06 03:43
When a symbol is defined, the memory used to store the symbol is permanently lost. If one is parsing external input, this makes one's application vulnerable to DOS. Secondarily, if while parsing external input, one refuses to make new symbols blindly, then the symbol list is something over which one has direct control, and it can be trusted in some situations to speed processing.
on 2013-02-06 05:06
On 6 February 2013 12:43, Student Jr <lists@ruby-forum.com> wrote: > When a symbol is defined, the memory used to store the symbol is > permanently lost. If one is parsing external input, this makes one's > application vulnerable to DOS. > > Secondarily, if while parsing external input, one refuses to make new > symbols blindly, then the symbol list is something over which one has > direct control, and it can be trusted in some situations to speed > processing. I see. If there's a logical distinction between externally- and internally-defined symbols, you could override the entrypoint (your deserialiser or whatever) to build a hash of String=>Symbol pairs. That way instead of using `Symbol.all_symbols.any?{|sym| sym.to_s == string}` you could use `my_hash.has_key? string`. Not sure how you'd ever populate said hash, though. Trusted entrypoints or something. However if you want to reuse existing symbols you'd have to have a way to prepopulate and continuously update the hash. I can think of a bunch of klugey ways to get it to work, but I'm not proud of any of them. I imagine it should be relatively easy* to define a new native singleton method `defined?` on Symbol... There's obviously a legitimate use-case; I think it would be worth making a feature request for this. * I'm not a core contributor. -- Matthew Kerwin, B.Sc (CompSci) (Hons) http://matthew.kerwin.net.au/ ABN: 59-013-727-651 "You'll never find a programming language that frees you from the burden of clarifying your ideas." - xkcd
on 2013-02-06 05:27
I see that some of the core team appear to hang out here, so I thought I would bring it up here. Certainly, if I were to optimize things, I would assume that the list of symbols is append only. Then I would put strings of the symbols in a set and add in the new ones as found for each call. After the first call, this would likely be pretty cheap, but there must be similar functionality already in place at the 'C' level, so this is really a request to exposed that as part of the class. (And avoid doubling the amount of memory used by symbols!)
on 2013-02-06 11:10
While on the topic, I have a related question about this Symbol DOS attack vector: Can't an upper limit be put on the size of the symbols table, and if it is exceeded, then an error is raised? Wouldn't that alone be sufficient to neuter such an attack?
on 2013-02-06 22:20
On 6 February 2013 20:10, Intransition <transfire@gmail.com> wrote: > While on the topic, I have a related question about this Symbol DOS attack > vector: Can't an upper limit be put on the size of the symbols table, and > if it is exceeded, then an error is raised? Wouldn't that alone be > sufficient to neuter such an attack? Or, rather than error, just flush a bunch. If they're needed, they'll come back. If not, no loss. I'm sure symbol creation isn't _that_ expensive. -- Matthew Kerwin, B.Sc (CompSci) (Hons) http://matthew.kerwin.net.au/ ABN: 59-013-727-651 "You'll never find a programming language that frees you from the burden of clarifying your ideas." - xkcd
on 2013-02-07 04:47
Given some of the other discussions I've seen, it seems likely that the problem with that would be symbol references.
on 2013-02-07 20:25
On Wed, Feb 6, 2013 at 3:43 AM, Student Jr <lists@ruby-forum.com> wrote: > When a symbol is defined, the memory used to store the symbol is > permanently lost. If one is parsing external input, this makes one's > application vulnerable to DOS. > > Secondarily, if while parsing external input, one refuses to make new > symbols blindly, then the symbol list is something over which one has > direct control, and it can be trusted in some situations to speed > processing. I don't believe this to be such a big deal: if you parse external data and you do not know how many different strings there are of a kind you would not use symbols anyway. Symbols make most sense for a fixed set of values - similarly to an enum. Also, there can also be DOS if external data is parsed and all the Strings are stored somewhere during the import (e.g. as Hash keys) which is quite a common scenario. If there are more Strings than fit into memory the program will crash as well. Kind regards robert
on 2013-02-08 04:29
DOS does not occur with strings because strings can be garbage collected. Symbols are forever. And rails allows > n! method definitions for models with n columns. For instance.
on 2013-02-08 07:10
calling to_sym on user input was not, is not and will not an good idea and what Rails is doing is maybe not an good example to copy it
on 2013-02-08 07:27
It's not just rails. Rails happens to be a hotbed of bad programming style to be sure, but the utility of allowing users to specify symbols is substantial. Allowing them to create symbols is a memory leak & therefore a DOS vulnerability. Thus the idea.
on 2013-02-08 07:38
On Fri, Feb 8, 2013 at 4:29 AM, Student Jr <lists@ruby-forum.com> wrote: > DOS does not occur with strings because strings can be garbage > collected. Symbols are forever. I am very well aware of that. Still the fact remains that you can create a DOS with *any* external data if the data set is large enough and the processing does not take that possibility into account. There is nothing really special about Symbols here - as I have pointed out earlier. (And. btw., you did not argue against that.) It is the way input from external sources is read. The choice to use Symbols for data with large variance is just one of many decisions that can do harm to an application. Cheers robert
on 2013-02-08 07:54
Well, if you want me to be explicit, I can. Certainly if you accept arbitrary user input for parsing, you have an automatic DOS vector by dint of sending a very large packet. Fine. But if someone can make a thousand connections, and over the course of the thousand connections PERMANENTLY chew up 100k of member per connection, you start of have a problem of a very different sort. It is in that sense--the sense of a memory leak--that symbols are different in this regard. And before you come back with "don't do that", remember that the ability to create arbitrary objects is a prime feature of YAML. There needs to be a way to scope that feature, and this is one option.
on 2013-02-10 03:37
On Fri, Feb 8, 2013 at 12:54 AM, Student Jr <lists@ruby-forum.com> wrote: > different in this regard. > > And before you come back with "don't do that", remember that the ability > to create arbitrary objects is a prime feature of YAML. There needs to > be a way to scope that feature, and this is one option. I'm running into something now with an API that converts XML to a nested Hash with symbol keys via Savon. At some point, we're going to be getting near 5000 items in these XML responses. It's not direly problematic for this particular case, as this is something that gets called infrequently at that rate, the XML is a response to a request on our end (i.e. is not open to the wild wild internet), and is in a self-contained job so it never permanently eats up memory, but it does give me pause.
on 2013-02-10 20:50
On Fri, Feb 8, 2013 at 7:54 AM, Student Jr <lists@ruby-forum.com> wrote: > Well, if you want me to be explicit, I can. Good. :-) > Certainly if you accept arbitrary user input for parsing, you have an > automatic DOS vector by dint of sending a very large packet. Fine. > > But if someone can make a thousand connections, and over the course of > the thousand connections PERMANENTLY chew up 100k of member per > connection, you start of have a problem of a very different sort. That can be achieved with any bad coding. > It is in that sense--the sense of a memory leak--that symbols are > different in this regard. Yes and no: yes, because Symbol has the property to aggregate in memory, no because it is the programmer's choice which allows bad things to happen. You cannot simply for a YAML.load() on a program via a network connection, > And before you come back with "don't do that", remember that the ability > to create arbitrary objects is a prime feature of YAML. There needs to > be a way to scope that feature, and this is one option. OK, now we're cooking! I can see where YAML is an issue because I couldn't find a way to customize Symbol deserialization for YAML. If that way existed, fairly easy measures could be taken to prevent excessive Symbol creation. For the time being one would have to patch the library to prevent this DOS OR modify the input before throwing it at YAML.load(). OTOH I cannot remember having read of a DOS via YAML Symbol deserialization on this list. Cheers robert
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.