Help re recording/replaying (i.e. automating) HTTP interactions to a web-site?

greg · April 26, 2008, 10:48am

Hi,

Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?

The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I’m thinking someone probably has a better way, or
plugin, to handle at least part (b)?

Tks

greg · April 26, 2008, 11:11am

Greg,

Have you looked into (Fire)Watir?

I am just releasing a new version of scRUBYt! (a web scraping
framework) where it will be possible to use FireWatir as the agent for
navigation/scraping, so you can write a simple but powerful DSL (stuff
like ‘click_link’, ‘fill_textfield’ etc) which is executed through
Firefox and is very well suited for scenarios you just described. Drop
me a line if you are interested.

But of course plain (Fire)Watir would do, too.

Cheers,
Peter

http://www.rubyrailways.com
http://scrubyt.org

greg · April 26, 2008, 11:19pm

sounds good, it support:
• https?
• cookies?
• building in some intelligence? (say when the link for step N will
change over time but you can write an algorithm for it)

thanks

greg · April 26, 2008, 11:29pm

PS. 4th question Peter I forgot:
• does it support downloading a file (eg csv file, account transactions)

greg · April 27, 2008, 4:46am

I see Watar requircs/drives a browser…i’m after something browser
independent…any other library/plugin suggestions?

greg · April 26, 2008, 11:33pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Greg H. wrote:
| PS. 4th question Peter I forgot:
| • does it support downloading a file (eg csv file, account transactions)

http://wtr.rubyforge.org/

Find it out?

Phillip G.
Twitter: twitter.com/cynicalryan

A born loser:
~ Somebody who calls the number that’s scrawled in lipstick on the phone
~ booth wall-- and his wife answers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgTn5wACgkQbtAgaoJTgL+jvgCePwARmYTIE1hktGz6yVD0JeWk
rHMAnRt+JpgafQAJivHFyXvag8Tt2duT
=smt6
-----END PGP SIGNATURE-----

greg · April 27, 2008, 8:17am

Greg H. wrote:

Hi,

Actually can anyone recommend a good technique / software / plugin
that would assist if I wanted to effectively (a) record my interaction
with my bank at the HTTP level, then (b) use this to automate that
behavior in my RoR application to automate pulling down daily account
details?

The best I can think of at the moment is: (a) Firefox Live HTTP
Headers plugin then (b) manually write Ruby code that sends these out
and waits for the response & check it before proceeding to the next
http request. I’m thinking someone probably has a better way, or
plugin, to handle at least part (b)?

I’m not sure what the Firefox Live HTTP headers plugin will do for you.
If you write a ruby program to send out requests to a url, then you know
what headers you are sending in your request, and when you get the
response, you can read the headers in the response.

greg · April 27, 2008, 5:21am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A: Because it makes it hard to follow the discussion.
Q: Why is top posting bad?

Greg H. wrote:
| I see Watar requircs/drives a browser…i’m after something browser
| independent…any other library/plugin suggestions?

WWW::Mechanize is quite popular, from what I’ve seen so far.

Phillip G.
Twitter: twitter.com/cynicalryan

~ - You know you’ve been hacking too long when…
…you discover that you’re balancing your checkbook in octal.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgT8P4ACgkQbtAgaoJTgL+AogCggpoBDGb4G+FJwdFWR5y0HCUd
OcEAoJAt9GtYI6/yawF2INPbm5mt8lum
=vQTZ
-----END PGP SIGNATURE-----

greg · April 28, 2008, 8:33am

Yes, scRUBYt! supports all these things… In the current
implementation WWW::Mechanize is used as the agent, but it doesn’t
support JavaScript and (more often than not) e-banking sites have
some JS… so that’s why I suggested the FireWatir based solution.

A browser-agnostic solution doesn’t exist (Mechanize is a browser too)

the nature of the task requires a browser. Call it as you like, but
if something is able to GET/POST requests, store cookies, use https,
sessions, … then it is a browser in my vocabulary.

Besides FireWatir is platform-independent (unlike Watir which is win32
only).

Cheers,
Peter

greg · April 28, 2008, 12:52pm

thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt…
2008/4/28 Peter S. [email protected]:

greg · April 28, 2008, 1:36pm

I must admit you’re managing to overwhelm me slightly with the number of
libraries / packages here
So what does the stick look like you’re talking about. Will it be
fronted
by scrubyt then like:

Scrubyt
- FireWatir
- Mechanize
  - hpricot

Is this correct? If it were simply could you put in brackets the key
thing
each layer does/focuses on?

Cheers
Greg

On Mon, Apr 28, 2008 at 9:06 PM, Peter S. [email protected]

greg · April 28, 2008, 1:07pm

On Apr 28, 2008, at 12:50 PM, Greg H. wrote:

thanks Peter - I was starting to look at Mechanize but will focus in
at scRUBYt…
2008/4/28 Peter S. [email protected]:

OK, cool. If you don’t have JS/AJAX or other trick that Mechanize
can’t handle, you should be OK.

On the other hand, if you do have JS/AJAX on the page, you will need
FireWatir, whether you like it or not the FireWatir-enabled
version of scRUBYt! is not yet officially released - if you want to
try it, you need to d/l it from http://scrubyt.org/scrubyt-0.4.03.gem
and install.

Let me know if you encounter any problems!

Cheers,
Peter

http://www.rubyrailways.com
http://scrubyt.org

greg · April 28, 2008, 2:10pm

In the current official release (0.3.4), FireWatir is not yet added.
So you have just Mechanize + Hpricot.

Mechanize does the navigational part - fill this textfield, then
click that button, and if you arrived at the result page, crawl to all
the detail pages etc.
Once you arrive at the final page from where you don’t want to go on
further, you start the actual scraping, and in this case that’s done
through Hpricot. You take the page where you arrived, parse it with
Hpricot and collect the results from it.

In the development release, it is possible to plug-in other agents
than Mechanize - theoretically anything, currently FireWatir is
implemented. But if you want to use Mechanize as the agent for
crawling, you don’t need to install FireWatir at all.

FireWatir based scraping has other benefits beyond JS/AJAX - for
example more robust HTML parsing (which is done by Firefox in this
case). Hpricot is a great parser but it can’t beat Hpricot (yet).
Firefox-parsed HTML also means you can use XPaths straight from
FireBug or DOM Inspector (which is not the case with Mechanize)

On the downside, Mechanize-based navigation/scraping is faster (you
don’t have to wait until the page renders, which is a prerequisite for
FireWater-based navigation etc.)

Does this answer your question? (If not, be sure to keep asking

Cheers,
Peter

http://www.rubyrailways.com
http://scrubyt.org