Sanitizing and stripping some html?

I have an application that manages a list of feeds. In a scheduled
BackgrounDRb worker, I parse each of these feeds and post the content
to the same site. Some of these feeds contain HTML in the description
of each item in the feed. I would like to first sanitize the HTML to
remove anything particularly harmful, then I would like to strip
certain tags, leaving the content.

I first tested Rick O.'s white_list plugin. It seems that this
simply strips tags and their content. For example, if I say p is a bad


gets completely stripped. I would actually like to
keep the ‘content’ and simply remove the HTML. Certain tags are
alright, such as b, em, strong, but most I would like stripped out.

I then tested and it
seems to do the trick. I was just wondering if anyone else had been
interested in stripping HTML but leaving the content and how they went
about doing so. Thanks for your input.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs