On Dec 13, 2007, at 9:34 AM, Mark W. wrote:
My problem with sanitization is that it puts representational logic in
the model.
And embedding HTML in the data doesn’t? If it’s representational to
remove it, then it’s representational to allow it, no?
Should the model really care that its data might one day
appear on an HTML page? Or should the HTML page take care of its own
needs?
IMO, both.
At the goes-inta stage, sanitization is nothing more than a
particular type of validation. As a parallel, I don’t want phone
numbers to be formatted, yet I allow users to enter formatting, and
then strip it out before I store the data. I’m not going to keep the
myriad formats of entered values, and the deal with removing it over
& over again as the data is used or displayed. The model should take
care of itself here.
I don’t buy the argument at all that users “may” need a certain HTML
tag. Setting aside the simplistic and narrow view of the world from
the perspective of the ubiquitous blog, HTML has no business in the
fields that make up a real, data-centric, application. Removing all
traces of it from such fields is an input validation issue that the
model should be taking care of before the data even get into the
model IMO (not after it is loaded into the model the way Rails
currently works).
Not stripping code before it reaches the model based on an academic
or philosophical point ignores the real-world danger of that stuff,
and the greater responsibility to take every opportunity to protect
those who use my aplication by taking every chance I can get to
ensure the data is not infected.
Personally, for fields that may require stylizing, I prefer an
alternative form of markup that provides greater control over what is
allowed. If there simply is no other option but raw HTML, then such
cases can be handled as exceptions, not as rules which endanger the
greater balance of the data.
Having done all that, one can end up with a false sense of, err,
security, by assuming that data coming from the database can be
trusted. Remember your X-Files lessons, and Trust No One. You never
know who, or what might have direct access to the database. Data
imports, mergers, restored backups from before data was cleaned. Even
that trusted admin with direct db access that you’ve paid to go to
all those security conferences may decide there really is easy money
to made by sneaking some code into the data.
So, for these and similar reasons (alternative uncontrolled data
sources like RSS vs DB for news stories), of course the HTML page
should take care of itself too with proper filters applied at the
goes-outta stage.
So, yeah, I say sanitization needs to be done at both ends, and
debating whether it should be done at one end or the other is like
debating whether we should vaccinate for tuberculosis and ignore the
disease if it shows up vs. ignoring the opportunity to vaccinate
because we can just treat someone if they get it. We need to do both:
vaccinate for prevention, and treat for containment.
–
def gw
acts_as_n00b
writes_at(www.railsdev.ws)
end