Forum: Ruby Using Scrubyt on bad markup pages

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Rolin N. (Guest)
on 2009-04-28 10:39
I am having trouble scrubbing a page that has bad markup.  After
fetching the page, the Scrubyt::Extractor exits while parsing the
document.  The Apple Safari web inspector shows numerous errors from the
page:

<meta> is not allowed inside <td>. Moving <meta> into the <head>.
Unmatched </embed> encountered.  Ignoring tag.
Unmatched </span> encountered.  Ignoring tag.
Unmatched </a> encountered.  Ignoring tag.

Is there anyway to scrub a page with scrubyt that is poorly formated?  I
am using the latest version (0.4.1) of scrubyt.

Thanks,
Rolin
Ryan D. (Guest)
on 2009-04-28 12:16
(Received via mailing list)
On Apr 27, 2009, at 23:39 , Rolin Nelson wrote:

>
> Is there anyway to scrub a page with scrubyt that is poorly
> formated?  I
> am using the latest version (0.4.1) of scrubyt.

switch to mechanize and update your gems. scrubyt depends on hpricot
and a very old version of mechanize. Mechanize now uses nokogiri
instead of hpricot and is much more resilient with errors.
Rolin N. (Guest)
on 2009-04-28 17:48
Ryan D. wrote:
> On Apr 27, 2009, at 23:39 , Rolin Nelson wrote:
>
>>
>> Is there anyway to scrub a page with scrubyt that is poorly
>> formated?  I
>> am using the latest version (0.4.1) of scrubyt.
>
> switch to mechanize and update your gems. scrubyt depends on hpricot
> and a very old version of mechanize. Mechanize now uses nokogiri
> instead of hpricot and is much more resilient with errors.

Thank you, I will try to use Mechanize directly.  However, when I
installed scrubyt 0.4.1 it did appear to have a dependency on nokogiri.
I've cut and pasted the standard output.

$ sudo gem install scrubyt-0.4.11.gem
Password:
Building native extensions.  This could take a while...
Successfully installed scrubyt-0.4.1
Successfully installed nokogiri-1.2.3
2 gems installed
Installing ri documentation for scrubyt-0.4.1...
Installing ri documentation for nokogiri-1.2.3...
Installing RDoc documentation for scrubyt-0.4.1...
Installing RDoc documentation for nokogiri-1.2.3...
This topic is locked and can not be replied to.