Forum: Ruby Emulating a web browser

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Adam B. (Guest)
on 2009-04-30 11:55
(Received via mailing list)
I am looking for a library to help me emulate a web browser, at least at
the
network level.  By this I mean I would like to run a program that, from
the
point of view of a web server, behaves just like, say, Firefox, but I
don't
care about actually displaying text or images or anything like that.
What I
would like it to do is speak HTTP, store and send cookies, automatically
fetch embedded content like images and style sheets, and so forth.  I
thought Mechanize was what I wanted, but it doesn't fetch embedded
content.
It doesn't even recognize it.  I could perhaps tell Nokogiri to find all
the
images and have Mechanize fetch them, but I've never used Nokogiri
before, I
don't know an exhaustive list of types of embedded content Firefox loads
automatically (images, JavaScript, Flash, anything else?), and it seems
like
getting Mechanize to emulate FF's HTTP request for these objects is
difficult.

Are there libraries that are meant for this type of interaction with
websites?  Perhaps I'm better off abandoning Ruby and making a Firefox
extension.

Thanks,

Adam
Vikhyat K. (Guest)
on 2009-04-30 12:15
(Received via mailing list)
On Thu, 2009-04-30 at 16:54 +0900, Adam B. wrote:
> automatically (images, JavaScript, Flash, anything else?), and it seems like
> getting Mechanize to emulate FF's HTTP request for these objects is
> difficult.
>
> Are there libraries that are meant for this type of interaction with
> websites?  Perhaps I'm better off abandoning Ruby and making a Firefox
> extension.

I'm not sure what you want to do, but have you looked at Watir?
http://wtr.rubyforge.org/
7stud -. (Guest)
on 2009-04-30 13:56
Adam B. wrote:
> I am looking for a library to help me emulate a web browser, at least at
> the
> network level.  By this I mean I would like to run a program that, from
> the
> point of view of a web server, behaves just like, say, Firefox, but I
> don't
> care about actually displaying text or images or anything like that.
> What I
> would like it to do is speak HTTP, store and send cookies, automatically
> fetch embedded content like images and style sheets, and so forth.  I
> thought Mechanize was what I wanted, but it doesn't fetch embedded
> content.
> It doesn't even recognize it.  I could perhaps tell Nokogiri to find all
> the
> images and have Mechanize fetch them, but I've never used Nokogiri
> before, I
> don't know an exhaustive list of types of embedded content Firefox loads
> automatically (images, JavaScript, Flash, anything else?), and it seems
> like
> getting Mechanize to emulate FF's HTTP request for these objects is
> difficult.
>
> Are there libraries that are meant for this type of interaction with
> websites?  Perhaps I'm better off abandoning Ruby and making a Firefox
> extension.
>
> Thanks,
>
> Adam


For static content, like images, stylesheets, js files, etc. all you
need is an html parser.  hpricot is an html parser with good docs (I
can't find many examples for nokogiri but it uses the same syntax as
hpricot for searching a document):


require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open("http://blog.segment7.net/"))

#images:
imgs = doc.search("img")
puts imgs[0][:src]

#stylesheets:
css = doc.search('//link[@type="text/css"]')
puts css[0][:href]

#javascript:
js = doc.search('//script[@type="text/javascript"]')
puts js[0][:src]

--output:--
/images/spinner-blue.gif?1140249801
http://segment7.net/styles/s7.css
/javascripts/cookies.js?1142467953

http://wiki.github.com/why/hpricot

Check out both Hpricot Basics and Hpricot Challenge for lots of
examples.

I don't think there are programs yet that can produce the page that the
user sees after javascript executes in a browser and does dynamic html
replacements.  I know they are trying to write them.

As for cookies, dealing with them usually goes hand in hand with filling
out forms, so you could use mechanize for that.  Also, mechanize
incorporates nokogiri, so you can use mechanize as an html parser to
search for the same things I did with hpricot:


require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get("http://blog.segment7.net/")
css = page.search('//link[@type="text/css"]')
puts css[0][:href]

--output:--
http://segment7.net/styles/s7.css
Srijayanth S. (Guest)
on 2009-04-30 17:46
(Received via mailing list)
Also, I've found Nokogiri to fail on a few things, most notably
maps.google.com. If Nokogiri is able to work your site though, then it
is
definitely a lot, lot faster than Hpricot.

Jayanth
Bret P. (Guest)
on 2009-04-30 18:11
(Received via mailing list)
On Thu, 2009-04-30 at 16:54 +0900, Adam B. wrote:

> > I am looking for a library to help me emulate a web browser, at least at
> the
> > network level.  By this I mean I would like to run a program that, from
> the
> > point of view of a web server, behaves just like, say, Firefox, but I
> don't
> > care about actually displaying text or images or anything like that.
>  What I
> > would like it to do is speak HTTP, store and send cookies, automatically
> > fetch embedded content like images and style sheets, and so forth.


Take a look at Celerity.
http://celerity.rubyforge.org/

Bret

--
Bret P.
CTO, WatirCraft LLC, www.watircraft.com
Lead Developer, Watir, www.watir.com

Blog, www.io.com/~wazmo/blog
Twitter, www.twitter.com/bpettichord
GTalk: removed_email_address@domain.invalid

Ask Me About Watir Training
www.watircraft.com/training
This topic is locked and can not be replied to.