Parsing JavaScript to prevent maliciousness?

Hello,

I’m working on a site that is implementing similar functionality to A
Certain Large Social Networking Site
’s Apps feature.

Application developers will be able to write apps in a hybrid HTML /
“FooML” / JavaScript syntax.

This will get parsed by my servers (as the man in the middle) and then
shoved back to the user’s browser as HTML.

Now, my normal inclination is just to dive in and start coding away =)

But I figured one of the smart people here might have some good pointers
on where to start.

The tricky problems, as I see them:

  • Allowing access to some JavaScript functionality while stripping out
    malicious calls (document.cookies ?)
  • Also: how to deal with Base64 / eval / other tomfoolery that attackers
    might attempt
  • Parsing custom tags like <foo:username />, <foo:friend_list count=“4”
    />.

The last one seems similar enough to parsing HTML trees so hopefully
there’s something in ruby-land that can help with this)

Any suggestions / links / pointers would be greatly appreciated!!

  • Sean

ps. if anyone is interested in working with me on some kind of open
source library that could handle this kind of thing in a
website/domain-agnostic way, feel free to hit me up.

On Aug 15, 9:35 pm, Mongoose Sir mongoose [email protected]
wrote:

might attempt
Does a Ruby Javascript parser exist? A quick google brings up
http://idontsmoke.co.uk/2005/rbnarcissus/, dunno how well it actually
works though. Either way, “stripping out malicious calls” is the
opposite of the correct approach (as attackers will outclever you,
100% of the time); rather you create a whitelist of acceptable
javascript, nixing everything that doesnt match your criteria. Mayhaps
it might even be easier to create your own language that users can
use, and translate that into JS?

  • Parsing custom tags like <foo:username />, <foo:friend_list count=“4”
    />.

The last one seems similar enough to parsing HTML trees so hopefully
there’s something in ruby-land that can help with this)

This seems like the standard Hpricot/Nokogiri parsing affair; are
either of those not suiting your needs?

On Sun, Aug 16, 2009 at 03:00:12PM +0900, pharrington wrote:

This will get parsed by my servers (as the man in the middle) and then
malicious calls (document.cookies ?)
use, and translate that into JS?
Yes, there are a couple javascript parsers out there:

RKelly (It’s pure ruby):
GitHub - tenderlove/recma: Pure ruby javascript parser and interpreter.

And Johnson (uses Spidermonkey’s parse tree):
GitHub - jbarnette/johnson: Johnson wraps JavaScript in a loving Ruby embrace.

Both support AST manipulation as well as turning the AST back in to
javascript. Either of them should be easy enough to work with, but
properly sanitizing javascript sounds hard!

@pharrington - thanks for the pointer on Hpricot/Nokogirl. I’m familiar
with Hpricot but will have to take a look at Nokogirl.

Aaron - Thanks. I’ll take a look at those. Think I’m getting in over
my head here, but should be fun times.

Fabian -

The whole point of the website is to allow third-party developers to
display HTML inside of a little content area within the site. (Not
unlike certain large social networking site’s Apps feature)

One approach I’ve seen is namespacing all css IDs with some kind of
application id or something.

So,

$(’#foo-alert’).html(‘You just won a prize!’);
…would have to become
$(’#app_1234567_foo-alert’).html(‘You just won a prize!’);

If they broke out of their content area and started manipulating the DOM
on other parts of the page, this wouldn’t even be the end of the world.
(they’d eventually get caught & banned)

I’m more concerned about malicious things they could do to the end-user,
e.g. cookie theft.

It sounds like a whitelist is the reasonable approach here.

Cheers,

  • Sean

Yep, sounds quite dangerous to me as well…

Another security problem might come from you allowing
users to manipulate the DOM (which I guess is one of the features you
plan
on implementing, since without that, there isn’t really much you can do
in
JS
except some alerts maybe :-).

I’d definitely forbid that.

  1. they could inject arbitrary text on the website, including spam,
    links etc. and start phishing attacks and
  2. due to the browsers executing every least bit of javascript they
    find,
    they could just inject a string containing tags, executing
    any JS they want and steal user sessions, forward private data etc. etc.

IMHO it’s already hard enough protecting the webapp from attacks from
the outside, but you also introduce an attackvector from the inside.

Greetz!

2009/8/16 Aaron P. [email protected]

For what it’s worth we’re using Johnson for something similar, the
intent
isn’t so much to prevent maliciousness but to allow multiple scripts
from
different 3rd party developers running in the same environment without
worrying about clashing variable or function names.

We previously used RKelly but moved to Johnson because it facilitated us
actually testing the compiled scripts by executing them.

On Sun, Aug 16, 2009 at 3:06 PM, Mongoose Sir mongoose
<[email protected]

If they broke out of their content area and started manipulating the DOM
on other parts of the page, this wouldn’t even be the end of the world.
(they’d eventually get caught & banned)

I’m more concerned about malicious things they could do to the end-user,
e.g. cookie theft.

But if you let them manipulate the dom, how are you going to prevent
script
injection?
because, that’s all you need to steal cookies. And if the attacker’s
sly, he
will conceal
the injection, delaying the being caught part until he got enough valid
sessions…

I don’t know what mysterious site you’re talking about, since I’m not
into
social network
stuff, but I’d sure like to know how they manage that problem…

Greetz!