How to programatically submit a form that uses document.form


#1

Hi there:

have been trying to use Mechanize to submit a form and scrape the
resulting page - but realized after looking at the form page that it
uses Javascript to perform the form submission :-/

The page with the form is built such that

The Javascript function checkSelections in turn then calls
document.forms[0].submit();

I understand Mechanize doesn’t include a JavaScript engine and I don’t
want to choose a platform-specific route, i.e. control Windows IE.

Has anyone experienced similar difficulties?

Thanks, Harry.


#2

Hello.

I’m guessing checkSelections() merely verifies that the form parameters
are in order before calling the submit action on the form. The submit
action merely visits somepage.asp (specified in the form action
attribute), and posts form parameters to it.

So I think visiting “somepage.asp?param1=value1”, and perhaps tacking
on the other paramaters from the form will work.


#3

Depending on what the JavaScript is doing, you may or may not need to
concern yourself with it. If it simply does validation, you can ignore
it, but just make sure your script submits valid data, since the
server probably assumes the data is valid if there is JavaScript
validation.

If the JavaScript does some processing and adds more information to
the form submission, then you will need to mimic this in your
Mechanize script.

But really, in the end, the form submission will always result in a
POST to the server, so overall you don’t need to worry about the
JavaScript, just what kind of data the server expects to be posted.

If worse comes to worse you can always connect a transparent proxy to
see exactly what the web page is posting, and then mimic that. A
transparent proxy can be built in Ruby using WEBrick in a few lines.

Ryan


#4

Ryan L. wrote:

Depending on what the JavaScript is doing, you may or may not need to
concern yourself with it. If it simply does validation, you can ignore
it, but just make sure your script submits valid data, since the
server probably assumes the data is valid if there is JavaScript
validation.

Which is so wrong on so many levels, and calls for hammering the server
with the most invalid data you can imagine.

Another possible quick hack if the JavaScript doesn’t do any processing
is mangling the HTML of the form and adding an to
the text before Mechanize gets it. This might end up as less typing than
faking the POST request if we’re at the quick’n’dirty level.

David V.
Who should spam the list less in the wee hours


#5

You guys were right - I went down the wrong path in analyzing what the
problem really was. My problem turned out to be me :wink: submitting the
form with a numeric value while it expected a string. Once that was
taken care of a simple submit worked just fine :wink:

Thanks again, Harry.