Regexp strip html

unknown · March 26, 2006, 4:34pm

i have a regexp able to strip html :

/<[^>]*>/

however, between <script and all the "text is preserved, tjen
i’ve tried :

def stripHTML

self.gsub(/<\S[^><]*>/, ‘’)

self.gsub(/\A.*<body [^>]*>(.*)<\/body>\s*\Z/, '\1').gsub(/<[^>]*>/,

‘’)
end

without success : the various javascript functions are kept ?

what’s my error here ?

unknown · March 27, 2006, 2:43pm

On 26/03/06, Une bÃ©vue [email protected] wrote:

i have a regexp able to strip html :

/<[^>]*>/

however, between <script and all the "text is preserved, tjen
…
what’s my error here ?

Look at it this way: you have ‘’. You
remove everything between angle brackets. You still have ‘Javascript’,
because that’s not actually inside <…>.

The simplest solution is probably to do something like this before
stripping out the remaining tags:

gsub(/<script.*?/im, ‘’)

Paul.

unknown · March 27, 2006, 3:29pm

The simplest solution is probably to do something like this
before
stripping out the remaining tags:

gsub(/<script.*?/im,

‘’)

yes, sounds clever ))