Forum: Ruby regexp strip html

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
unknown (Guest)
on 2006-03-26 18:34
(Received via mailing list)
i have a regexp able to strip html :

/<[^>]*>/

however, between <script and </script> all the "text is preserved, tjen
i've tried :

  def stripHTML
#    self.gsub(/<\S[^><]*>/, '')
    self.gsub(/\A.*<body [^>]*>(.*)<\/body>\s*\Z/, '\1').gsub(/<[^>]*>/,
'')
  end

without success : the various javascript functions are kept ?

what's my error here ?
Paul B. (Guest)
on 2006-03-27 16:43
(Received via mailing list)
On 26/03/06, Une bévue <removed_email_address@domain.invalid> wrote:
> i have a regexp able to strip html :
>
> /<[^>]*>/
>
> however, between <script and </script> all the "text is preserved, tjen
...
> what's my error here ?

Look at it this way: you have '<script>Javascript</script>'. You
remove everything between angle brackets. You still have 'Javascript',
because that's not actually inside <...>.

The simplest solution is probably to do something like this before
stripping out the remaining tags:

gsub(/<script.*?</script>/im, '')

Paul.
unknown (Guest)
on 2006-03-27 17:29
(Received via mailing list)
Paul B. <removed_email_address@domain.invalid> wrote:

> The simplest solution is probably to do something like this
> before
stripping out the remaining tags:

gsub(/<script.*?</script>/im,
> '')

yes, sounds clever ))
This topic is locked and can not be replied to.