Best way to automate web browser tasks?

Hal_F · October 25, 2006, 3:01am

I know there’s Watir or something… but I’m not using
IE (rather Firefox) and I don’t want to “control the
browser” per se…

Rather I want to automate some repetitive tasks… like
go to a form, fill in a couple of fields, click a checkbox,
pick from a dropdown, and click on Save.

How would you do this?

Hal

Hal_F · October 25, 2006, 3:25am

On 10/24/06, Hal F. [email protected] wrote:

Rather I want to automate some repetitive tasks… like
go to a form, fill in a couple of fields, click a checkbox,
pick from a dropdown, and click on Save.

How would you do this?

If it doesn’t involve javascript, mechanize should do just fine.

Hal_F · October 25, 2006, 3:27am

On 2006.10.25 10:00, Hal F. wrote:

I know there’s Watir or something… but I’m not using
IE (rather Firefox) and I don’t want to “control the
browser” per se…

Rather I want to automate some repetitive tasks… like
go to a form, fill in a couple of fields, click a checkbox,
pick from a dropdown, and click on Save.

How would you do this?

WWW::Mechanize should be able to do it.

Hal_F · October 25, 2006, 3:43am

On Wed, 25 Oct 2006, Hal F. wrote:

Hal
curl?

seriously - do you have to do it via firefox?

-a

Hal_F · October 25, 2006, 3:31am

On Oct 24, 2006, at 6:00 PM, Hal F. wrote:

Hal

WWW::Mechanize[1]  is pretty cool for stuff like this. There also

exists SafariWatir and FireWatir for safari and firefox respectively.
But firewatir is 20 times slower then normal watir at the moment.

[1] http://rubyforge.org/forum/forum.php?forum_id=9457

– Ezra Z.
– Lead Rails Evangelist
– [email protected]
– Engine Y., Serious Rails Hosting
– (866) 518-YARD (9273)

Hal_F · October 25, 2006, 5:53am

[email protected] wrote:

curl?

seriously - do you have to do it via firefox?

No, I don’t have to do it via FF or any other browser.

Can curl do that sort of thing? I’ve never used it
except for simple sucking-down of pages.

Hal

Hal_F · October 25, 2006, 4:05am

On 10/24/06, Hal F. [email protected] wrote:

I know there’s Watir or something… but I’m not using
IE (rather Firefox) and I don’t want to “control the
browser” per se…

Rather I want to automate some repetitive tasks… like
go to a form, fill in a couple of fields, click a checkbox,
pick from a dropdown, and click on Save.

How would you do this?

If I were going to do it in a browser, I’d use Selenium RC [1] along
with
the included selenium.rb to do things like this:

| selenium = Selenium::SeleneseInterpreter.new(“localhost”, 4444,
“*firefox”, “http://www.google.com/”, 10000);
| selenium.start
| selenium.open “http://www.google.com/”
| browser.type “name=q”, “ruby”
| browser.click_and_wait “name=btnG”

Cheers,
/Nick

[1] http://www.openqa.org/selenium-rc/

Hal_F · October 25, 2006, 6:03am

curl?

seriously - do you have to do it via firefox?

No, I don’t have to do it via FF or any other browser.

Can curl do that sort of thing? I’ve never used it
except for simple sucking-down of pages.

Yes.

-F/–form <name=content>

(HTTP) This lets curl emulate a filled in form in which a user has
pressed the submit button. This causes curl to POST data using the
content-type multipart/form-data according to RFC1867. This enables
uploading of binary files etc. To force the content part to be be a
file, prefix the file name with an @ sign. To just get the content part
from a file, prefix the file name with the letter <. The difference
between @ and < is then that @ makes a file get attached in the post as
a file upload, while the < makes a text field and just get the contents
for that text field from a file.

Example, to send your password file to the server, where password is
the name of the form-field to which /etc/passwd will be the input:

curl -F password=@/etc/passwd www.mypasswords.com

To read the files content from stdin insted of a file, use

where the file name shouldve been. This goes for both @ and <
constructs.

You can also tell curl what Content-Type to use for the file upload
part, by using type=, in a manner similar to:

curl -F “[email protected];type=text/html” url.com

See further examples and details in the MANUAL.

This option can be used multiple times.

Hal_F · October 25, 2006, 6:49am

On Wed, 25 Oct 2006, Hal F. wrote:

except for simple sucking-down of pages.

Hal

that, and then some.

here’s a script i use to post to sciruby:

 #! /usr/bin/env ruby

 $VERBOSE = nil

 #
 # built-in
 #
   require "getoptlong"
 #
 # setup
 #
   uri = "http://sciruby.codeforpeople.com/sr.cgi"
   moin_id = ENV['SCIRUBY_MOIN_ID']
 #
 # options
 #
   opts = {}

   GetoptLong::new(
     [ "--moin_id",    "-m",

GetoptLong::REQUIRED_ARGUMENT ]
).each{|opt, arg| opts[opt.delete(“-”)] = arg}

   moin_id = opts["moin_id"] || ENV["MOIN_ID"] || moin_id
 #
 # argv
 #
   page, infile = ARGV.shift, ARGV.shift
 #
 # run
 #
   abort "#{ $0 } page [infile or stdin] [--moin_id=moin_id]" unless

page

   page = "http://sciruby.codeforpeople.com/sr.cgi/#{ page }" unless
     page =~ %r/^http/

   data = (infile.nil? or infile == "-") ? STDIN.read :

open(infile){|f| f.read}

   command = <<-sh
   curl "#{ page }" \
          -s -S --stderr - \
          -bMOIN_ID=#{ moin_id } -A=Mozilla/4.0 \
          -F action=savepage -F comment=curl -F "savetext=<-"
   sh
   command = command.strip.split(%r/\s+/).join(" ")

   STDERR.puts command
   IO::popen("#{ command }", "r+") do |pipe|
     pipe.puts data
     pipe.close_write
     while((line = pipe.gets))
       print line
     end
   end

   abort "command <#{ command }> failed with <#{ $?.inspect }>"

unless
$? == 0

you might also consider http-access2, here’s an example

http://codeforpeople.com/lib/ruby/rubyforge/rubyforge-0.1.1/bin/rubyforge

nearly all of what you need to know is at the begining or the very end.

cheers.

-a

Hal_F · October 25, 2006, 6:21am

Philip H. wrote:

Yes.

-F/–form <name=content>

[snip snip]

Doing a ‘man curl’ I see now that it has a plethora of options –
someone once said, a metric sh*tload.

It looks a little painful, though. I suppose for a dropdown you’d
have to type the full value of the selected option? Or am I thinking
of a checkbox?

I see now that this thing has a lot of Javascript in it. When I hover
over the New button, it says:

javascript:hideMainMenu();submitbutton(‘new’);

which just complicates things that much more.

What about Mechanize? Better/worse/different?

Thanks,
Hal

Hal_F · October 25, 2006, 1:41pm

Don’t forget about DHTML. Without a browser you will have to write your
own browser engine to process java script. For the simple forms without
html you can use TCP to get and send HTML requests.

Alex
http://webunittesting.com

Hal_F · October 25, 2006, 12:49pm

On Wed, 25 Oct 2006, Hal F. wrote:

I know there’s Watir or something… but I’m not using
IE (rather Firefox) and I don’t want to “control the
browser” per se…

Rather I want to automate some repetitive tasks… like
go to a form, fill in a couple of fields, click a checkbox,
pick from a dropdown, and click on Save.

How would you do this?

One way is by e-mail:

http://www.faqs.org/faqs/internet-services/access-via-email/

http://www.expita.com/index.html

Some of this info may be obsolete now. Forms are more difficult
and depend on the server you use. Agora cannot do this, unless
someone’s version has been improved. Getweb could do forms when I
last looked some years back. I don’t know which getweb servers are
still running.

Hal

    Hugh

Hal_F · October 25, 2006, 10:06pm

I worked on something very similar recently. We looked at Beautiful
Soup, Rubyful Soup, and Mechanize, and went with Mechanize in the end.
I think Beautiful Soup is actually better, but it’s in Python. For me
that was a minus, and for the other programmer on the project, it was
a deal-breaker. Rubyful Soup is a direct port of Beautiful Soup which
is literally ten times slower. Mechanize has performance equivalent to
Beautiful Soup and is pretty easy to use as well. I think some of the
underlying code comes from why the lucky stiff. That’s generally
considered a good thing.

Hal_F · October 26, 2006, 7:02pm

On Wed, Oct 25, 2006 at 11:02:37AM +0900, Nick S. wrote:

How would you do this?
| browser.click_and_wait “name=btnG”
+1

It executes inside the browser, so it executes Javascript and such.