Mechanize question

Hello,

I am using mechanize 0.6.3. On Aaron’s blog I have found this example:

form.selectlist.options[2].select

however, for me, ‘puts form.methods.sort’ revealed that form does not
have a method ‘selectlist’. What’s up? I am doing something wrong?

Here is the code I am using:

require ‘rubygems’
require ‘mechanize’

agent = WWW::Mechanize.new
page = agent.get ‘www.some-page.com
form = page.forms.with.name(‘formname’).first

and this form does not have a method selectlist. (just in case:
page.forms.with.name(‘formname’) == ‘WWW::Mechanize::Form’ and not nil
or other kind of nonsense :slight_smile:

Thanks,
Peter

__
http://www.rubyrailways.com

Peter S. wrote:

Hello,

I am using mechanize 0.6.3. On Aaron’s blog I have found this example:

form.selectlist.options[2].select

however, for me, ‘puts form.methods.sort’ revealed that form does not
have a method ‘selectlist’. What’s up? I am doing something wrong?

Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
found that indeed, Form does not have a selectlist method.

OK but then How I am supposed to get the selectlist of a form?

the $1.000.000 question: In the same RDoc I read this:

‘Class Form does not work in the case there is some invalid (unbalanced)
html involved…’

Well, on my page, the tag is not even closed. Can this be fixed
somehow?

Thanks,
Peter

__
http://www.rubyrailways.com

Hey, I don’t know the answer to this question specifically, but I did
some work with Mechanize recently and I found that it was pretty much
doing everything we needed it to do, just sometimes returning things
in forms we didn’t expect. Pretty much everything we stumbled on, we
solved by getting the return value and doing .class on it to find out
what it was coming back to us as.

Going on the million-dollar question, which I actually only just
noticed, I think the HTML we were working against with Mechanize –
this was a consulting thing, so I don’t have the code in front of me,
it’s on somebody else’s laptop – but I think the HTML was pretty bad.
totally noncompliant, non-validating. we used Hpricot a lot, which is
pretty great, we might have actually given up on Mechanize for the
HTML-parsing and just used it for setting and getting cookies, things
like that. I don’t quite recall, but definitely have a look at
Hpricot, it’s pretty great and I think it was written by why the lucky
stiff.

On 11/21/06, Giles B. [email protected] wrote:

Going on the million-dollar question, which I actually only just
noticed, I think the HTML we were working against with Mechanize –
this was a consulting thing, so I don’t have the code in front of me,
it’s on somebody else’s laptop – but I think the HTML was pretty bad.
totally noncompliant, non-validating. we used Hpricot a lot, which is
pretty great, we might have actually given up on Mechanize for the
HTML-parsing and just used it for setting and getting cookies, things
like that. I don’t quite recall, but definitely have a look at
Hpricot, it’s pretty great and I think it was written by why the lucky
stiff.

Mechanize now has direct support (and is implemented on top of) hpricot.
[IIRC]

Aaron is likely to have more info on this, of course.

On Fri, Nov 24, 2006 at 04:16:48AM +0900, Aaron P. wrote:

Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
form tag that isn’t closed, then you should be fine. If HPricot cannot
handle the unbalanced form tag, you can write a pluggable parser to fix
up your HTML before it is run through HPricot.

If Hpricot cannot handle the tag, please open a ticket[1] or mail me, so
I can
fix it and add to my tests. This way we all get to benefit from these
wild tags
you’ve captured.

_why

[1] https://code.whytheluckystiff.net/hpricot/

On Tue, Nov 21, 2006 at 10:46:26PM +0900, Peter S. wrote:

Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
found that indeed, Form does not have a selectlist method.

OK but then How I am supposed to get the selectlist of a form?

The select list is treated like a regular field. Say you have a select
list with name ‘foo’, you could find it like this:

form.fields.name(‘foo’)

-or, with method missing magic-

form.foo

the $1.000.000 question: In the same RDoc I read this:

‘Class Form does not work in the case there is some invalid (unbalanced)
html involved…’

Well, on my page, the tag is not even closed. Can this be fixed
somehow?

Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
form tag that isn’t closed, then you should be fine. If HPricot cannot
handle the unbalanced form tag, you can write a pluggable parser to fix
up your HTML before it is run through HPricot.

Thanks,
Peter

__
http://www.rubyrailways.com

Hope that helps.

–Aaron

_why wrote:

On Fri, Nov 24, 2006 at 04:16:48AM +0900, Aaron P. wrote:

Mechanize uses HPricot for its HTML parsing. If Hpricot can handle the
form tag that isn’t closed, then you should be fine. If HPricot cannot
handle the unbalanced form tag, you can write a pluggable parser to fix
up your HTML before it is run through HPricot.

If Hpricot cannot handle the tag, please open a ticket[1] or mail me, so I can
fix it and add to my tests. This way we all get to benefit from these wild tags
you’ve captured.

Sure :slight_smile: I am just beginning with mechanize so I don’t have a real-life
testcase yet, but since I am going to scrape tens or maybe hundreds of
pages with HPricot + mechanize in the near future, I guess something
will pop up sooner or later…

Peter

__
http://www.rubyrailways.com

Peter S. wrote:

Meanwhile, after browsing the RDoc of WWW::Mechanize::GlobalForm I have
somehow?
You mean, like, closing the … tag pair? Well, yes,
closing
it is always an option. What is the issue with editing the page and
making
sure it is valid HTML?

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs