Forum: Ruby Ruby Mechanize issues

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
E60aa6027c07b31763e62c7fb6f31002?d=identicon&s=25 Timothy Brown (Guest)
on 2007-07-22 02:43
(Received via mailing list)
I'm having a devil of a time trying to scrape an onerous website with
Scrubyt/Mechanize.

Changing the site's policies is out of scope, so I need to try to find
some way to work around them.

I think the problem is centered around the site's use of schemes.
HTTPS is often used in place of https, for instance.  I'm not a Ruby
wizard, but I've patched Mechanize at line 429, line 439, and line
466.  On 466 i've put

    if uri.scheme == 'https|HTTPS' && ! http_obj.started?

I assume this is right but like I said, i'm a Nuby. :)

Anyway, the site does the following.  After the below output, Ruby
times out in rbuf_fill.  I assume it's not really following the
redirect, and my patching is wrong, but I can't be certain.
Regardless of whether I use scrubyt or mechanize natively, the
symptoms are the same.  My script looks like:

 page = agent.get('http://accounts.xxxx.org')
form = page.forms.find {|f| f.name == "frmInput"}
form.fields.find {|f| f.name == 'Name'}.value = 'xxxx'
form.fields.find {|f| f.name == 'Password'}.value = 'xxxx'
puts page.forms
page = agent.submit(form, form.buttons.first)

after the obvious initialization.

Does anyone have any clues?  I'm at wits end here.

http://pastie.caboo.se/81051 is the output.
This topic is locked and can not be replied to.