Escaping/stripping all user HTML input


#1

(Some of you may be reading this twice as I accidentally posted this
to ruby talk)

I am writing an application where I know that I do not need to allow
any HTML input from a user.

I am considering using before_filter at the controller level to call a
method that essentially performs the following on the appropriate
members of the params hash:

  • call strip_tags()
  • escape any remaining characters with h()

The reason why I am doing this is it seems repetitive and error prone
to have to call the above method every time in a view where user input
is being displayed. Ultimately, I would prefer to store the data in as
“non-malicious” format as possible and not have to worry at the
presentation level of escaping that data at a later time.

Is there a better way to do this? Is there existing code that does
this already? Some googling yielded nothing specific other than
postings to the effect of “in your view, make sure to use h()”.


#2

Luis wrote:

I am writing an application where I know that I do not need to allow
any HTML input from a user.

I am considering using before_filter at the controller level to call a
method that essentially performs the following on the appropriate
members of the params hash:

  • call strip_tags()
  • escape any remaining characters with h()

I will answer, but I don’t understand the problem with storing raw HTML
and
escaping it. If the user typed into a text area, they should see

in the view, with the angle brackets, and we should not strip the tags.
You
can call the equivalent of h() when you save their change, for example.

The reason why I am doing this is it seems repetitive and error prone
to have to call the above method every time in a view where user input
is being displayed. Ultimately, I would prefer to store the data in as
“non-malicious” format as possible and not have to worry at the
presentation level of escaping that data at a later time.

Is there a better way to do this? Is there existing code that does
this already? Some googling yielded nothing specific other than
postings to the effect of “in your view, make sure to use h()”.

I can think of a way. It’s sick, excessive, and bullet-proof. Here is
assert_tidy:

def assert_tidy(messy = @response.body, verbosity = :noisy)
scratch_html = RAILS_ROOT + ‘/…/scratch.html’ # TODO tune me!
File.open(scratch_html, ‘w’){|f| f.write(messy) }
gripes = tidy -eq #{scratch_html} 2>&1
gripes.split("\n")
exclude, inclued = gripes.partition do |g|
g =~ / - Info: / or
g =~ /Warning: missing <!DOCTYPE> declaration/ or
g =~ /proprietary attribute/ or
g =~ /lacks “(summary|alt)” attribute/
end
puts inclued if verbosity == :noisy
# inclued.map{|i| puts Regexp.escape(i) }
assert_xml tidy -wrap 1001 -asxhtml #{scratch_html} 2>/dev/null
# CONSIDER that should report serious HTML deformities
end

You can take out the assert_xml if you don’t have yar_wiki (whence that
comes). That code uses the command-line tidy. Don’t worry about the
Ruby-oriented tidy.so project.

Migrate that test-side code to production code, and you have a function
that
turns sloppy HTML into pristine XHTML. Now that your input is XML, you
can
strip out all the tags like this:

class REXML::Element
def inner_text
self.each_element( ‘.//text()’ ){}.join( ‘’ )
end
end

This obscenely over-the-top solution will hopefully inspire someone to
post
a solution in the usual 4 lines!


Phlip
http://www.oreilly.com/catalog/9780596510657/
“Test Driven Ajax (on Rails)”
assert_xpath, assert_javascript, & assert_ajax