On Mon, Jul 19, 2010 at 12:53 AM, Albert S.
So, my question is:
Given a URL, how can I save this page as MHT?
(My program is in Ruby, but I don’t mind delegating this part to a
Although another post cites wikipedia as implying that using the mht
format seems like a lot of effort for not much gain, I have found it
to save web pages (including images) to MHT (using all of Opera, Firefox
Internet Explorer), and then extract what I want (including images) from
That said, once a web page is saved (if necessary using plugins) as MHT,
a file with images etc in a subdir, or as zip archives, it should be
easy to take out what you want from whatever the save format is.
So: is the problem saving as MHT from the command line, or one of saving
anything - MHT or HTML+Images - from the command line?
Can you use Watir or http://watij.com + JRuby? From a quick look at
websites these may work, but I haven’t tried them yet because the
learning curve looks a bit steep, and because at the moment (on
Windows) I can use AutoIt with Ruby to (programatically) switch from a
DosBox to the browser, and send keystrokes to save the page as MHT or
HTML or whatever. It’s not exactly elegant, but it does (mostly!) work.
all else fails, can you do something similar in Linux?
If you find a reasonably elegant solution, then I’d be very interested.