Regexp strip html


#1

i have a regexp able to strip html :

/<[^>]*>/

however, between <script and all the "text is preserved, tjen
i’ve tried :

def stripHTML

self.gsub(/<\S[^><]*>/, ‘’)

self.gsub(/\A.*<body [^>]*>(.*)<\/body>\s*\Z/, '\1').gsub(/<[^>]*>/,

‘’)
end

without success : the various javascript functions are kept ?

what’s my error here ?


#2

On 26/03/06, Une bévue removed_email_address@domain.invalid wrote:

i have a regexp able to strip html :

/<[^>]*>/

however, between <script and all the "text is preserved, tjen

what’s my error here ?

Look at it this way: you have ‘’. You
remove everything between angle brackets. You still have ‘Javascript’,
because that’s not actually inside <…>.

The simplest solution is probably to do something like this before
stripping out the remaining tags:

gsub(/<script.*?/im, ‘’)

Paul.


#3

Paul B. removed_email_address@domain.invalid wrote:

The simplest solution is probably to do something like this
before
stripping out the remaining tags:

gsub(/<script.*?/im,

‘’)

yes, sounds clever ))