Hpricot - parse html

flnative · January 2, 2008, 10:13pm

hi @all

I would like to parse html code and remove all tags that starts with

How can I remove this tags with regex? I used the gsub! function to
manipulate the string.

Thanks for helping…

flnative · January 3, 2008, 4:38am

Try this…

C:\temp>irb
irb(main):001:0> mystring = “xxx yy zz”
=> “xxx yy zz”
irb(main):002:0> mystring.gsub(//,’’)
=> “xxx yy zz”

Regards,
Jim

flnative · January 3, 2008, 11:37am

You should also process the \n, \r char.

So I think the regex should be “”.

flnative · January 3, 2008, 6:52pm

On Jan 3, 2008 4:37 AM, sishen [email protected] wrote:

You should also process the \n, \r char.

So I think the regex should be “”.

Don’t forget about the multiline option, it’s easy, just stick an ‘m’
after the regexp.

Daniel Brumbaugh K.