Saving HTML as MHT

Hello.

I want to download some HTML page, but I also want to save with it the
images it contains. I was thinking about saving it as a MHT file, this
will make my life easier because I won’t have to handle the files. I’ve
checked both my browsers (Firefox and Opera) but I see that there’s no
command-line switch that allows me to save URLs as MHT files. I also
searched the net for a Ruby library but found one that seems to only
work on Windows (it’s provided with a DLL) which is not good for me
because I’m using Ubuntu.

So, my question is:

Given a URL, how can I save this page as MHT?

(My program is in Ruby, but I don’t mind delegating this part to a
command-line utility.)

According to http://en.wikipedia.org/wiki/MHTML

http://en.wikipedia.org/wiki/MHTMLpursuing the mht file format seems
like
a lot of effort for not much gain…

On Mon, Jul 19, 2010 at 12:53 AM, Albert S.
[email protected]wrote:

So, my question is:

Given a URL, how can I save this page as MHT?

(My program is in Ruby, but I don’t mind delegating this part to a
command-line utility.)

Although another post cites wikipedia as implying that using the mht
file
format seems like a lot of effort for not much gain, I have found it
useful
to save web pages (including images) to MHT (using all of Opera, Firefox
and
Internet Explorer), and then extract what I want (including images) from
the
MHT file.

That said, once a web page is saved (if necessary using plugins) as MHT,
as
a file with images etc in a subdir, or as zip archives, it should be
fairly
easy to take out what you want from whatever the save format is.

So: is the problem saving as MHT from the command line, or one of saving
anything - MHT or HTML+Images - from the command line?

Can you use Watir or http://watij.com + JRuby? From a quick look at
their
websites these may work, but I haven’t tried them yet because the
initial
learning curve looks a bit steep, and because at the moment (on
Microsoft
Windows) I can use AutoIt with Ruby to (programatically) switch from a
Ruby
DosBox to the browser, and send keystrokes to save the page as MHT or
plain
HTML or whatever. It’s not exactly elegant, but it does (mostly!) work.
If
all else fails, can you do something similar in Linux?

If you find a reasonably elegant solution, then I’d be very interested.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs