MozEmbed Archiving

redroofgreentree · October 23, 2007, 9:20pm

Hi all,

I want to write a webbrowser that archives everything as I view it.
(I.e.
whenever it finishes loading a page, I wnat ti to do a “save everything
as
html” in a directory named $date-$time-$second-…

So anyway, I’m looking at the Ruby MozEmbed functions, and I see stuff
like append_data and render_data … but I see no way to get the data
back
(I want the actual HTML, not just a screen shot.)

How can I do this? If there an easy way to intercept everything at the
open_stream level and copy all the data then? Is there some trick with
MozEmbed that allows me to grab the html file that it is rendering?

Thanks!

redroofgreentree · October 23, 2007, 10:06pm

The Gtkmozembed widget does not provide this level of functionality.
However, I have devised a cunning way of building a bridge between Ruby
and
the Mozilla Javascript engine.

Basically the nuts and bolts of it are a parser that transforms
Javascript
source code into a one-liner which I then ask Mozilla to load using the
javascript: protocol handler. I then get input back by marshaling return
data through a JSON handler and assigning it to document.title - which I
can
then pick back up on the ruby side by using the Gtkmozembed title
callback.

I use it to get the dimensions of a rendered page and then manipulate
the
scrollbars such that I can stitch together a full-page screenshot image.
It
works beautifully. I was also thinking about extending it into a full
blown
testing framework… not convinced this could be entirely useful yet.

Concerning your problem: using the method I have described I can
retrieve
the html source from the DOM - either
document.documentElement.innerHTML, or
a representation of the “real” DOM by calling xml.serializeToString(
document.documentElement).

So, you could certainly retrieve the source and get a list of all of the
page assets but you’d have to retrieve them again separately from your
Ruby
app… or perhaps dig them out of the Mozilla disk cache.

If you’d like a copy of my source then let me know and I’ll try and find
some time to clean it up.

Cheers,

Andy.