Removing extranious html

I can’t seem to find a way to do this… i have a bunch of html files
that i just need to remove from the <!DOCTYPE to the tag on the
top then i need to remove from to on the bottom.

i looked at gsub and i’m learning regular expressions but i can’t seem
to figure out how they work. so far i’ve been able to figure out how to
kill single words and single letters but not whole blocks of letters and
words.

it’s mildly frustrating.

well if anyone can help it would be greatly appreciated. i’m off to my
regex book.

thanks in advanced.

Morgan M. wrote:

I can’t seem to find a way to do this… i have a bunch of html files
that i just need to remove from the <!DOCTYPE to the tag on the
top then i need to remove from to on the bottom.

i looked at gsub and i’m learning regular expressions but i can’t seem
to figure out how they work. so far i’ve been able to figure out how to
kill single words and single letters but not whole blocks of letters and
words.

it’s mildly frustrating.

well if anyone can help it would be greatly appreciated. i’m off to my
regex book.

Your regex book will be the best help, but here’s a clue: I think you’re
going about it inside-out. It would probably easiest to extract the
entire element. It’s relatively simple to write a regex that
will cover most cases, but if you have to cover absolutely every valid
case, you may want to use Nokogiri, Hpricot, or JavaScript DOM
manipulation instead.

thanks in advanced.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Marnen Laibow-Koser wrote:

Your regex book will be the best help, but here’s a clue: I think you’re
going about it inside-out. It would probably easiest to extract the
entire element. It’s relatively simple to write a regex that
will cover most cases, but if you have to cover absolutely every valid
case, you may want to use Nokogiri, Hpricot, or JavaScript DOM
manipulation instead.

thanks in advanced.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

hrmm. i was using gsub to blank all the stuff i didn’t want… maybe i’ll
just pull the stuff that i do. the marvels of reversing your logic.
thanks.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs