Forum: Ruby regexp strip html

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-03-26 16:34
(Received via mailing list)
i have a regexp able to strip html :

/<[^>]*>/

however, between <script and </script> all the "text is preserved, tjen
i've tried :

  def stripHTML
#    self.gsub(/<\S[^><]*>/, '')
    self.gsub(/\A.*<body [^>]*>(.*)<\/body>\s*\Z/, '\1').gsub(/<[^>]*>/,
'')
  end

without success : the various javascript functions are kept ?

what's my error here ?
2abf5beb51d5d66211d525a72c5cb39d?d=identicon&s=25 Paul Battley (Guest)
on 2006-03-27 14:43
(Received via mailing list)
On 26/03/06, Une bévue <pere.noel@laponie.com.invalid> wrote:
> i have a regexp able to strip html :
>
> /<[^>]*>/
>
> however, between <script and </script> all the "text is preserved, tjen
...
> what's my error here ?

Look at it this way: you have '<script>Javascript</script>'. You
remove everything between angle brackets. You still have 'Javascript',
because that's not actually inside <...>.

The simplest solution is probably to do something like this before
stripping out the remaining tags:

gsub(/<script.*?</script>/im, '')

Paul.
C8da03a9f69be8910fa9b16b4db969ed?d=identicon&s=25 unknown (Guest)
on 2006-03-27 15:29
(Received via mailing list)
Paul Battley <pbattley@gmail.com> wrote:

> The simplest solution is probably to do something like this
> before
stripping out the remaining tags:

gsub(/<script.*?</script>/im,
> '')

yes, sounds clever ))
This topic is locked and can not be replied to.