Xss

Hi
Google Code Archive - Long-term storage for Google Code Project Hosting.

I have found this rails plugin which automatically removes XSS from
models upon saving. This is great. My concern is, which is the best
choice, 1) use plugin like this Or 2) allow the content to be entered in
to db as it is and later escape it from view using h method or sanitize
. Why I am asking this is , the latest railscast 204 says rails3
automatically sanitize html. But why cant use this type of plugin for
not at all entering such malicious user inputs to the database? Please
share your thoughts

Thanks
Tom

Hi
No comments yet!

Tom

I agree; it has never made sense to me to have to sanitize the output.

Escaping everything as you display it does have the benefit of allowing
you
to see what information is in the DB. Also you can change which tags are
allowed after the fact by using sanitize() instead of h()

The downside is that you have to escape it every time you display the
page.
Granted this isn’t a heavy operation, but it does happen repeatedly. It
seems to me that if you are always going to have to use h() anyway,
things
should just be sanitized before insertion into the DB and forgo the h().

Just my opinion. I still use h() and sanitize()


Jeremy C.
http://twitter.com/jeremychase

Life is much easier if you just store what they typed and deal with it
when you use it…

And again going through the plugin doc I found an example like

class Message < ActiveRecord::Base
xss_terminate :except => [ :body ]
end

     Means we can exempt some fields from sanitization. So isn't 

that sufficient? Any other thoughts?

Tom

On 16 March 2010 11:41, Jeremy C. [email protected] wrote:

things
should just be sanitized before insertion into the DB and forgo the h().
Just my opinion. I still use h() and sanitize()

Two problems with that:

The first and smallest is an annoyance.
If I want to save my blog in a db, and I write a post that has the
content:
“Never use ‘<’ in your HTML; use ‘<’ instead”
…this will get written to the DB as:
“Never use ‘>’ in your HTML; use ‘>’ instead”
…which then gets encoded with h() in a view as:
“Never use ‘&gt;’ in your HTML; use ‘&gt;’ instead”
…or if just output straight to the view “because it was sanitized
before putting it in the DB” as:
“Never use ‘<’ in your HTML; use ‘<’ instead”

You’ll have seen this happen on loads of bulletin boards and
feedback comments all over the web.

Adjusting the user’s input before storing it in the db is “bad”,
because you can never reverse it without all sorts of unreliable
hoops. Just store what they typed, and whenever you deal with it
assume it’s highly-toxic.

The second problem is an arrogant presumption that the only place that
will ever use this user-supplied data is in the rendering of an HTML
page.
But what happens when you’re storing details, say of an order placed,
and the user enters their special delivery comments :
“Please knock & wait for >5mins”
You store this as:
“Please knock & wait for <5mins”
…because you know you’re going to have to display it in a
confirmation page on the web site and you don’t want to worry about
encoding it there every time, but you forget that you might want it
put into a PDF that’s generated for the delivery driver, or use it in
a JS function on the web page, or include it in a field of a CSV
export. In each of these instances, you’re going to have to decode it
back from the “safe” HTML encoded version to the user input (I refer
you to my first point; that you can not reliably do this :slight_smile: before
encoding it however you need for your new use.

Life is much easier if you just store what they typed and deal with it
when you use it…

Michael,

Excellent points.


Jeremy C.
http://twitter.com/jeremychase

On 17 March 2010 03:58, Tom M. [email protected] wrote:

    Means we can exempt some fields from sanitization. So isn't

that sufficient? Any other thoughts?

So instead of messing with all of the user-supplied input, you only
mess with some of it? That won’t end up in confusion for the
developers trying to re-render the DB content to PDF, etc.; when some
of the data renders fine, and some has to be “decoded” back to plain
text (but doesn’t go back to exactly what the user typed)…

I didn’t think I was ambiguous: fiddling with users’ data before you
store it is going to end up in confusion and pain somewhere [1]. It’s
perfectly easy to assume that all DB content is taited, and treat it
appropriately for whatever purpose you want to put it.

My 2p… YMMV :slight_smile:

[1] Of course, you need to “fiddle” with it to prevent SQL injection -
but the end result should be that the content in the DB is exactly
what the user typed even if they typed “Robert’); DROP TABLE
students;–”