For user submitted content, textile or inspected html?

I know use another markup language, like wiki syntax or textile is to
prevent javascript injection. But for user who don’t know about wiki
syntax or textile, I’m thinking about just allow them to enter plain
html, parse the content, and reject all questionable tags and
attributes, only allow predefined (safe) tags, like bold or italic,
etc.

Is using html for markup less secure than using non-html markup?
What’s the main reason people use non-html markup?

Dorren wrote:

I know use another markup language, like wiki syntax or textile is to
prevent javascript injection. But for user who don’t know about wiki
syntax or textile, I’m thinking about just allow them to enter plain
html, parse the content, and reject all questionable tags and
attributes, only allow predefined (safe) tags, like bold or italic,
etc.

Wiki-like syntax can be easily learned (and Textile is such a syntax:
markup that is non-HTML), and saves you from the hassle of sanitizing
the input. You’ll have to handle a lot of special cases, due to browser
incompatibilities (IE6, for example, allows javas\ncript as a valid tag,
which, for computers, isn’t the same as javascript, obviously).

Is using html for markup less secure than using non-html markup?
What’s the main reason people use non-html markup?

Yes, HTML is less secure, mainly due to JS exploit issues, and otherwise
lacks readability by humans.
If you can avoid HTML input, do so.

Shameless plug:
ClothRed’s aim is to convert HTML into textile, and will be able to
serve as a sanitizer in the (hopefully) not too distant future:
http://clothred.rubyforge.org

(P.S.: Out of a similar need than yours, I came up with this library)


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #13:

Your first release can always be improved upon.

There are simple means to counter it.
You do still need to sanitize and validate user input.
All unknown/unmonitored input must be considered untrusted.
Even with in a another markup, there could be attacks in waiting.
The fact is, you have to do this stuff anyway.
You should also limit user input. There must be some upper bounds to
the size of the input.
You have to care about SQL injection, XSS cross site scripting
attacks, all of it.
There is no shortcut on security.