Forum: Ruby on Rails escaping/stripping all user HTML input

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cc9dc380cf7f63603920910cf62736f9?d=identicon&s=25 Luis (Guest)
on 2007-06-28 21:18
(Received via mailing list)
(Some of you may be reading this twice as I accidentally posted this
to ruby talk)

I am writing an application where I know that I do not need to allow
any HTML input from a user.

I am considering using before_filter at the controller level to call a
method that essentially performs the following on the appropriate
members of the params hash:
- call strip_tags()
- escape any remaining characters with h()

The reason why I am doing this is it seems repetitive and error prone
to have to call the above method every time in a view where user input
is being displayed. Ultimately, I would prefer to store the data in as
"non-malicious" format as possible and not have to worry at the
presentation level of escaping that data at a later time.

Is there a better way to do this? Is there existing code that does
this already? Some googling yielded nothing specific other than
postings to the effect of "in your view, make sure to use h()".
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2007-07-04 16:27
(Received via mailing list)
Luis wrote:

> I am writing an application where I know that I do not need to allow
> any HTML input from a user.
>
> I am considering using before_filter at the controller level to call a
> method that essentially performs the following on the appropriate
> members of the params hash:
> - call strip_tags()
> - escape any remaining characters with h()

I will answer, but I don't understand the problem with storing raw HTML
and
escaping it. If the user typed <yo> into a text area, they should see
<yo>
in the view, with the angle brackets, and we should not strip the tags.
You
can call the equivalent of h() when you save their change, for example.

> The reason why I am doing this is it seems repetitive and error prone
> to have to call the above method every time in a view where user input
> is being displayed. Ultimately, I would prefer to store the data in as
> "non-malicious" format as possible and not have to worry at the
> presentation level of escaping that data at a later time.
>
> Is there a better way to do this? Is there existing code that does
> this already? Some googling yielded nothing specific other than
> postings to the effect of "in your view, make sure to use h()".

I can think of a way. It's sick, excessive, and bullet-proof. Here is
assert_tidy:

  def assert_tidy(messy = @response.body, verbosity = :noisy)
    scratch_html = RAILS_ROOT + '/../scratch.html'  #  TODO  tune me!
    File.open(scratch_html, 'w'){|f|  f.write(messy)  }
    gripes = `tidy -eq #{scratch_html} 2>&1`
    gripes.split("\n")
    exclude, inclued = gripes.partition do |g|
      g =~ / - Info\: / or
      g =~ /Warning\: missing \<\!DOCTYPE\> declaration/ or
      g =~ /proprietary attribute/ or
      g =~ /lacks "(summary|alt)" attribute/
    end
    puts inclued if verbosity == :noisy
    # inclued.map{|i| puts Regexp.escape(i) }
    assert_xml `tidy -wrap 1001 -asxhtml #{scratch_html} 2>/dev/null`
      #  CONSIDER  that should report serious HTML deformities
  end

You can take out the assert_xml if you don't have yar_wiki (whence that
comes). That code uses the command-line tidy. Don't worry about the
Ruby-oriented tidy.so project.

Migrate that test-side code to production code, and you have a function
that
turns sloppy HTML into pristine XHTML. Now that your input is XML, you
can
strip out all the tags like this:

class REXML::Element
   def inner_text
     self.each_element( './/text()' ){}.join( '' )
   end
end

This obscenely over-the-top solution will hopefully inspire someone to
post
a solution in the usual 4 lines!

--
  Phlip
  http://www.oreilly.com/catalog/9780596510657/
  "Test Driven Ajax (on Rails)"
  assert_xpath, assert_javascript, & assert_ajax
This topic is locked and can not be replied to.