Forum: Ruby Strip out ALL javascript from HTML source.

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-02 06:31
(Received via mailing list)
Hi.

I've got a bit of an issue where I have an input source of HTML source
that
anyone can use.  I need to strip out all javascript.  Attributes, links
tags
etc.

At this stage I'm thinking Hpricot is the go.  I guess I'm hoping there
is
someone out there that has done this and is willing to share.

Cheers
Daniel
C25bf61976ed22c79eaea2a6f5c0968d?d=identicon&s=25 CRAZ8 (Guest)
on 2007-04-02 07:46
(Received via mailing list)
One of Rick Olson's many plugins can do what you want:

http://agilewebdevelopment.com/plugins/whitelist

Which tags are handled is controllable by your code
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-02 08:02
(Received via mailing list)
On 4/2/07, CRAZ8 <tomfakes@gmail.com> wrote:
>
>
> One of Rick Olson's many plugins can do what you want:
>
> http://agilewebdevelopment.com/plugins/whitelist
>
> Which tags are handled is controllable by your code


Thanx for the pointer.  But I think I need a bit more than that.  I need
to
be able to leave tags alone for the most part, except <script> tags, but
attributes need a little more control.  what I've come up with so far:

   - all on*** attributes have to go
   - any attribute that has "javascript:" in it has to go
   - any attribute with "*.js*" has to go
   - Also according to the exploit on myspace by
sam<http://namb.la/popular/tech.html>It seems that I need to remove
javascript: in attributes with newlines
   anywhere in the word.


I hope I've got them all.  It doesn't seem that the whitelist plugin
will do
this, although I will be very happy if it does.

Cheers
Daniel
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-02 08:46
(Received via mailing list)
Sorry if this got through and is a double post.  It got sent back to me.
821395fe70906c8290df7f18ac4ac6cf?d=identicon&s=25 Rick Olson (Guest)
on 2007-04-02 09:03
(Received via mailing list)
> all on*** attributes have to go
> any attribute that has "javascript:" in it has to go
> any attribute with "*.js*" has to go
> Also according to the exploit on myspace by sam It seems that I need to
> remove javascript: in attributes with newlines anywhere in the word.
>
>  I hope I've got them all.  It doesn't seem that the whitelist plugin will
> do this, although I will be very happy if it does.

You can of course contribute back to the plugin.  However, I believe
it'll do everything you listed short of removing any attribute with
*.js*.  Not sure what the point of that is though.

Another option is to just yank the code and make it bend to your
specific whims.  It's not a very large one.

--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-02 13:46
(Received via mailing list)
On 4/2/07, Rick Olson <technoweenie@gmail.com> wrote:
> > do this, although I will be very happy if it does.
>
> You can of course contribute back to the plugin.  However, I believe
> it'll do everything you listed short of removing any attribute with
> *.js*.  Not sure what the point of that is though.


I'll certainly look at contributing if I can find a way to extend the
functionality.  Perhaps a strip_all_javascript method or something like
that.

I think the point is trying to make it as difficult as possible to
upload
javascript.  I want to accept arbitrary html source and display it on my
page but at the same time, minimise the risk of having my page hijacked.
The above list is the ways that I have thought of to include javascript
in a
submission.  I don't think it's possible to completely remove the risk
of
submitted javascript since I could have a url like
http://example.com/stuffset the headers to javascript and return
whatever script it wants, but I
want to minimise that risk.

I hope i've considered most of the ways that ppl could hijack my page.
I
want to include as many tags intact as possible.

Cheers
Daniel
Df5e7adb20adae6c120b04e7cafb15a0?d=identicon&s=25 Rob Sanheim (rsanheim)
on 2007-04-02 19:23
(Received via mailing list)
On 4/2/07, Daniel N <has.sox@gmail.com> wrote:
>  Thanx for the pointer.  But I think I need a bit more than that.  I need to
>  I hope I've got them all.  It doesn't seem that the whitelist plugin will
> do this, although I will be very happy if it does.
>
>  Cheers
>  Daniel
>

I dunno how secure you want this to be, but to be truly safe from XSS
you'll need to handle more cases then Rick's plugin does - here is one
stab at it:

http://golem.ph.utexas.edu/~distler/blog/archives/...

If you want to get even more depressed about securing a web app today,
go here to get an idea of the insane amount of XSS vectors.

http://ha.ckers.org/xss.html

- Rob

http://robsanheim.com
http://seekingalpha.com
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2007-04-02 19:27
(Received via mailing list)
> >  Thanx for the pointer.  But I think I need a bit more than that.  I need to
> > be able to leave tags alone for the most part, except <script> tags, but
> > attributes need a little more control.  what I've come up with so far:

Pipe your HTML thru tidy -asxhtml. Then use REXML and XPath to strip
out anything you don't need (such as the header block that -asxhtml
will install). And strip out the <script> tags, and anything that
looks like a <script> tag, such as the <object> tags.

The absolute safest, of course, is to strip anything not appearing on
a whitelist, such as <i>, <b>, <em>, etc.

--
  Phlip
  http://c2.com/cgi/wiki?ZeekLand  <-- NOT a blog!!
821395fe70906c8290df7f18ac4ac6cf?d=identicon&s=25 Rick Olson (Guest)
on 2007-04-02 21:47
(Received via mailing list)
> I dunno how secure you want this to be, but to be truly safe from XSS
> you'll need to handle more cases then Rick's plugin does - here is one
> stab at it:
>
> http://golem.ph.utexas.edu/~distler/blog/archives/...

It uses most of the same tests I wrote, adds a lot more allowed
svg/mathml tags, and style attribute sanitizing.  I just prefer to
leave it out, but textile uses it.  Those tests were written from that
hackers article.  You could just port the style stuff to white_list,
and then you don't have to bother maintaining a plugin.

--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-03 01:00
(Received via mailing list)
On 4/3/07, Rob Sanheim <rsanheim@gmail.com> wrote:
> > >
> > any attribute with "*.js*" has to go
>
>
> - Rob
>
> http://robsanheim.com
> http://seekingalpha.com


Thats is a bit depressing.  It seems that no matter how hard I try I
won't
be able to completely remove the js in submitted source.
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-03 01:01
(Received via mailing list)
On 4/3/07, Phlip <phlip2005@gmail.com> wrote:
> will install). And strip out the <script> tags, and anything that
> looks like a <script> tag, such as the <object> tags.
>
> The absolute safest, of course, is to strip anything not appearing on
> a whitelist, such as <i>, <b>, <em>, etc.
>
> --
>   Phlip
>   http://c2.com/cgi/wiki?ZeekLand  <-- NOT a blog!!


Is the object tag really that bad?  I mean I think I need to support it
since you tube widgets and I guess others are based on object tags and I
need to support youtube at least.
821395fe70906c8290df7f18ac4ac6cf?d=identicon&s=25 Rick Olson (Guest)
on 2007-04-03 04:37
(Received via mailing list)
>  Is the object tag really that bad?  I mean I think I need to support it
> since you tube widgets and I guess others are based on object tags and I
> need to support youtube at least.

One idea is to allow a custom format.  Perhaps just look for youtube
urls, and convert them to videos?  Obviously this should be done after
sanitizing...


--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-03 05:23
(Received via mailing list)
On 4/3/07, Rick Olson <technoweenie@gmail.com> wrote:
>
>
> >  Is the object tag really that bad?  I mean I think I need to support it
> > since you tube widgets and I guess others are based on object tags and I
> > need to support youtube at least.
>
> One idea is to allow a custom format.  Perhaps just look for youtube
> urls, and convert them to videos?  Obviously this should be done after
> sanitizing...


I'm not really sure what you mean by custom format.   Does that mean
like
dom selection in the whitelist plugin? eg.  Allow tag x if it's a child
of
tag Y and has attribute z='value' or z!='javascript'

I really want to be as broad ranging as possible and include as many
tags as
possible and also in their original form.  It's important for this app
that
the tags, as much as possible be left as they're inputted, I just don't
want
the result to hijack my page.
821395fe70906c8290df7f18ac4ac6cf?d=identicon&s=25 Rick Olson (Guest)
on 2007-04-03 05:47
(Received via mailing list)
>  I really want to be as broad ranging as possible and include as many tags
> as possible and also in their original form.  It's important for this app
> that the tags, as much as possible be left as they're inputted, I just don't
> want the result to hijack my page.

Well, I originally meant something very custom like
<video:http://youtubeurl....>.  Though since most normal folks can't
grok this, and web power users have enough formats to figure out,
perhaps you could just seek out youtube urls sitting on a single line
or something.

For instance, Tumbler lets me add the raw embed code or just a youtube
video URL if I want to post a video.


--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-03 05:57
(Received via mailing list)
On 4/3/07, Rick Olson <technoweenie@gmail.com> wrote:
> Well, I originally meant something very custom like
> <video:http://youtubeurl....>.  Though since most normal folks can't
> grok this, and web power users have enough formats to figure out,
> perhaps you could just seek out youtube urls sitting on a single line
> or something.
>
> For instance, Tumbler lets me add the raw embed code or just a youtube
> video URL if I want to post a video.


I could not change the input to that level.  <video:...> but I've had a
look
at the youtube and also odeo widgets and they both boil down to an embed
tag
with a type of shockwave flash.

Do you think it would be a bad idea to enable support for embed tags
with
that type with src from youtube.com or odeo.com / (a list of known)
domains?  If I did this I could remove the object tag from around the
embed
tag and I don't think it would have much of an impact.
821395fe70906c8290df7f18ac4ac6cf?d=identicon&s=25 Rick Olson (Guest)
on 2007-04-03 06:05
(Received via mailing list)
On 4/2/07, Daniel N <has.sox@gmail.com> wrote:
> > > want the result to hijack my page.
>  I could not change the input to that level.  <video:...> but I've had a
> look at the youtube and also odeo widgets and they both boil down to an
> embed tag with a type of shockwave flash.

You're really not getting the point of what I'm trying to say.  I'm
saying, strip all object tags, and use something custom that gets
replaced w/ an object tag that you generate afterwards.  If you're
generating insecure JS, you have issues :)

>  Do you think it would be a bad idea to enable support for embed tags with
> that type with src from youtube.com or odeo.com / (a list of known)
> domains?  If I did this I could remove the object tag from around the embed
> tag and I don't think it would have much of an impact.

I don't really know, I haven't thought about this stuff much.  I just
strip all object/embed tags by default.  You may have to do some
digging for any attack vectors on object/embed tags.  I don't think
it'd be that different from image tags though.


--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-03 06:14
(Received via mailing list)
On 4/3/07, Rick Olson <technoweenie@gmail.com> wrote:
> > > > as possible and also in their original form.  It's important for
> > > perhaps you could just seek out youtube urls sitting on a single line
> saying, strip all object tags, and use something custom that gets
> replaced w/ an object tag that you generate afterwards.  If you're
> generating insecure JS, you have issues :)


Ok that makes more sense to me.


>  Do you think it would be a bad idea to enable support for embed tags with
> > that type with src from youtube.com or odeo.com / (a list of known)
> > domains?  If I did this I could remove the object tag from around the
> embed
> > tag and I don't think it would have much of an impact.
>
> I don't really know, I haven't thought about this stuff much.  I just
> strip all object/embed tags by default.  You may have to do some
> digging for any attack vectors on object/embed tags.  I don't think
> it'd be that different from image tags though.


K thanx for your help.   Looks like I've got some digging to do :)

Cheers
Daniel
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2007-04-03 16:42
(Received via mailing list)
Daniel N wrote:

> Is the object tag really that bad?  I mean I think I need to support it
> since you tube widgets and I guess others are based on object tags and I
> need to support youtube at least.

I didn't read the original post. If the question is "how do I do safe
markup and transclusions, in a public blog?", then naturally get
either a Wiki markup (or YAML), or permit a subset of HTML. To
transclude Object tags, invent a new tag called <video>. That way you
prevent shenanigans, right?

--
  Phlip
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2007-04-03 19:59
(Received via mailing list)
Daniel N wrote:

> Thats is a bit depressing.  It seems that no matter how hard I try I won't
> be able to completely remove the js in submitted source.

(Use the XPath system I suggested, then) remove all tags except those
on a short white-list, and then remove all their attributes.

--
  Phlip
Ae97ad0da5c7887be291561eb1720093?d=identicon&s=25 Alex Soto (asoto)
on 2007-04-03 21:08
(Received via mailing list)
We <http://www.jobscore.com> use SafeHtml <http://pixel-apes.com/
safehtml/>  it's really good about leaving the tags alone but removing
potentially dangerous XSS type stuff.

It's PHP, but I wrapped it in a class that shells out to the php
interpreter.

Alex
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-04 01:18
(Received via mailing list)
On 4/4/07, Phlip <phlip2005@gmail.com> wrote:
> either a Wiki markup (or YAML), or permit a subset of HTML. To
> transclude Object tags, invent a new tag called <video>. That way you
> prevent shenanigans, right?
>
> --
>   Phlip


If it were a public blog then I would say you would be right and I
wouldn't
be so worried but this app is not for a blog.

I've been considering the suggestion of creating custom <video> tags to
replace the embed and object tags from youtube and odeo.  But as much as
possible I need to leave tags alone.  And the tags could be anything.
But I
think your right, I need to have a sub set of "safe" tags that can get
through.

Cheers
9d1f5d2d9de70bd9a934f557dc95a406?d=identicon&s=25 Daniel ----- (liquid)
on 2007-04-04 01:18
(Received via mailing list)
On 4/4/07, apsoto@gmail.com <apsoto@gmail.com> wrote:
>
>
> We <http://www.jobscore.com> use SafeHtml <http://pixel-apes.com/
> safehtml/>  it's really good about leaving the tags alone but removing
> potentially dangerous XSS type stuff.
>
> It's PHP, but I wrapped it in a class that shells out to the php
> interpreter.


Thanx for the link.  At my first quick glance it looks like the
WhiteList
plugin will do these things as well with a few tweaks.

I will have a bit better look at it when I get a chance
This topic is locked and can not be replied to.