Against Everything Buckets
< Next Topic | Back to topic list | Previous Topic >
Posted by Cassius
Feb 11, 2009 at 09:13 PM
Pierre Paul Landry wrote:
>The MHT file embeds the images, javascript and css in the file. That is why it can be
>larger. If you save your web page in HTM with all referenced files, the two will be
>comparable in size.
I don’t think so. I saved a file in .htm format. It’s properties are:
Size: 26 kb
Size on disk: 28 kb.
I then opened this saved .htm file and re-saved it as an .mht file. The .mht file properties are:
Size: 286 kb
Size on disk: 288 kb.
-c
Posted by Alexander Deliyannis
Feb 12, 2009 at 08:58 AM
Cassius, were you online when you opened and re-saved the file? If so, it’s quite possible that IE downloaded extra info from the web.
I just tried converting a .html file to .mht while offline. There was a difference, but not significant (8.44 Kb to 10.6 Kb) and this could be explained by the presence in the .mht file of empty ‘containers’ for the various outside files.
Alexander
Posted by Hugh
Feb 12, 2009 at 02:03 PM
More views on this topic, related to Tinderbox and DevonThink:
http://www.markbernstein.org/Feb09/EverythingBuckets.html
http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=7&t=7452
I’m sympathetic to the idea that the development of structure can sometimes only be incremental, or even emergent. (But of course that’s not in itself an argument for tagging as opposed to a pre-defined hierarchy; it is an argument for an inbox…)
Posted by Cassius
Feb 13, 2009 at 07:43 AM
Alexander Deliyannis wrote: Cassius, were you online when you opened and re-saved the file? If so, it’s quite possible that IE downloaded extra info from the web.
>
>I just tried converting a .html file to .mht while offline. There was a difference, but not significant (8.44 Kb to 10.6 Kb) and this could be explained by the presence in the .mht file of empty
>‘containers’ for the various outside files.
>
>Alexander
============================================
Alexander, you appear to be correct. When I turned off my wireless connection, opened a saved .htm file, and then tried to save it in .mht format, one of two things happened:
1. Like you, my .mht file was not much larger than the .htm file, or
2) The .htm file would not save. I got a message that the file (or something) could not be found. Perhaps some .htm files retain some links/hooks to what Pierre referred to and cannot be saved without accessing them from the net.
My apologies to Pierre and thanks for proving me wrong.
-c
Posted by Pierre Paul Landry
Feb 13, 2009 at 01:21 PM
Cassius wrote:
>My apologies to Pierre and thanks for proving me wrong.
No harm done. An MHT file is essentially the same as an EML file (i.e. an email message) without the email header (to, from, etc). It contains the HTML, and a base64 encoded section for each dependant content (image, js, css, embedded object). If you look at the HTML page source, you’ll see near the top or embedded in the text, references to these. I know MHT quite well, as InfoQube can grab content in HTML and in MHT formats.
MHT and RTF can both be used as standalone documents. RTF is widely used of course, mostly for word processors, MHT/EML has traditionally been limited to emails and web clipping. They are very different internally and conversion between the two has always been problematic. IMO, the big big plus of MHT, is that it is embeddable. So you can easily combine multiple documents into a single one, something impossible with RTF (AFAIK). When editing MHT documents in MS Word, just about every feature is supported, excepts versions.