Collecting web pages 2.0

Started by MadaboutDana on 9/2/2014
MadaboutDana 9/2/2014 8:01 am
Hi folks,

In the knowledge that knowledge management is part of what this forum is all about, I thought I'd revert to the interesting subject of web pages and how best to capture/archive them.

All the following thoughts relate to Mac only, I'm afraid. Macs have a plethora of gorgeous info management apps - the quality and sophistication is generally much more impressive than most apps I've encountered on PC. The sharing - not so much, although it's getting better.

So here, without more ado, are my thoughts on the apps I've been experimenting with:

▪ Growly Notes ([print to Growly Notes function]: nice, but the search function is poor - slow - and moving notes around is difficult)
▪ OneNote ([copy to OneNote bookmark] also nice, and search function is good; moving notes is easy, too, but import is slow, format is restrictive – graphic only – and despite full-text indexing, you can’t export or copy the actual text. I’ve tried the copy-and-paste approach, too – as per Notebooks – and while it works okay, it’s not optimal because formatting isn’t preserved well, and URLs aren’t copied over, unlike the PC version)
▪ Yojimbo ([print to Yojimbo function] powerful, does the sensible thing and stores web pages as PDF files, but doesn’t communicate desperately well with its own iOS app and doesn’t store metadata or even website URLs, unlike OneNote. Search is good, though)
▪ Together ([print to Together function] more or less identical to Yojimbo, but with an even flakier iOS client and rather unstable behaviour; promising, however)
▪ Notebooks ([copy and paste] still one of my favourites, but you have to manually copy and paste web pages and their URLs. A bit tedious! Reproduction of web pages is astonishingly good, however)
▪ Stache ([copy to Stache button] very fast and very convenient, with powerful search function. But needs a lot of memory – can only really be consulted if it’s the only app running – and stores web pages as .webarchive files – awkward to open because of OS X’s restrictions – or screenshots, which are convenient but limited. Plus very limited output functions - no printing, for example, which means you can't output the web archives as PDFs - silly!)
▪ Curio ([print to Curio Scrapbook function] haven’t really experimented yet, but I think it’s limited to PDF – but copy and paste is almost certainly an option)
▪ DevonThink (nope, haven’t got it - I know, I should, but it really is expensive!)
▪ Thoughts ([copy and paste] quite nice, with a very good search function, but not quite interesting enough to compete. Although it might become so)

And my current favourite:
Scrivener: [Import as web page function] so what’s so good about Scrivener? Well, it stores pages as web archives, but keeps URLs, searches through them instantaneously (using full-text indexing) and is very convenient to use, being the ultimate writing platform, so you can open web archives in separate windows, or alongside note windows, or however you like. The only downside is having to copy and paste URLs into the web page import dialog box, although you could probably automate that using AppleScript. The other good thing is that you can print out web archives as PDF files that are identical to what Yojimbo or Together produce (something Stache doesn’t do), and yet the whole web page is also stored as a web archive. Now that’s flexible. And finally, documents collected by Scrivener are indexed by Spotlight and FoxTrot Pro. Because I use Scrivener more and more as a research and drafting platform, it's usually open on my desktop in any case. Downside: no iOS client. Upside: FoxTrot has a very good iOS client that allows you to search through the text you've indexed on your Mac (cunning!), so you can store text archived in Scrivener in your FoxTrot indices!

I'll let you know what I find out about Curio, but I think it'll be essentially the same as Yojimbo/Together in terms of collection facilities, with the added flexibility of copy and paste, and Curio's very powerful search and tagging functions. We shall see!



MadaboutDana 9/2/2014 8:10 am
Actually, Thoughts is worth another look, because it preserves HTML formatting surprisingly well, and like Notebooks, also has a wide variety of output formats, including RTF, Word PDF and Open Document (LibreOffice/OpenOffice format). It can import all these things, too, as well as web archives. It's also relatively lightweight compared to the heavies like Curio or Scrivener (although Scrivener makes impressively efficient use of memory resources). Hm. And it's cheap! A downloadable trial is available from the developer's website (http://thoughtsapp.com
Paul Korm 9/3/2014 11:39 am
If your goal is to import web pages as webarchives, convert on the fly to PDF, organize your collection (perhaps with tags, labels, and so forth), search your collection, and take notes -- then perhaps you should revisit DEVONthink. The Personal and Pro level editions will do all of that -- IMO, better than the alternatives. The Pro Office edition adds email importing and PDF OCR, which not everyone needs. DEVONthink includes a Clipper plugin for Safari / Chrome / Firefox that enables capture of the current tab to webarchive, PDF, HTML, rich text, etc.

I use DEVONthink continually and would not consider the other apps on the list as remotely comparable for webarchival purposes. Grab a trial and test it hard.
Prion 9/3/2014 11:57 am
What Paul said.
Devonthink has been one of my most-used (and most continually used) apps for many years and does an admirable job at archiving webpages amongst other things. On the subject of webpages: You can either archive entire pages (pdf or webarchive), parts of the page or a cleaned-up version of the webpage using an instapaper tool. Does a very good job of getting rid of ads and other visual noise which I am typically not interested in when archiving.
Prion
jamesofford 9/3/2014 1:01 pm
I am also a user of Devonthink Pro Office. It does a great job of grabbing web pages and storing them. It also has a variety of different ways that you can slice and dice them in order to organize them. I tend not to use it for web page clipping these days because it is kind like using a cannon to swat a fly.

I've tried many of the programs in your list(I am also a mac user.) but have settled on Notesuite(http://www.notesuite.io/mac/ also in the App Store). Notesuite is pretty fast, and syncs easily with the iOS version. It has the capability for to dos, though the capability is pretty simple. In general it is a pretty simple piece of software, which I prefer for this sort of thing.

When Onenote came out for the Mac and for iOS I tried using it. I had a good experience with it on Windows. Unfortunately, it is too slow for me. The ability to organize in Onenote is better than Notesuite, but you can put together folders in Notesuite, and that is sufficient for me.

Jim
MadaboutDana 9/3/2014 8:42 pm
Ah rats! I had the nasty feeling somebody - and in fact, here we have several somebodies - would tell me to revisit DevonThink. I've already collected so many interesting information management tools that I have, yes, I confess it! been turning a blind eye to DevonThink, which is not exactly cheap.

I do have NoteSuite, and have tried awfully hard to like it. But for some reason, it's really annoying. You're right, however: it is one of the few relatively powerful note managers that has a direct clipping service and an iOS client, plus a reasonably good (albeit not great) search function. Having said that, I'm quite pleased with the neat way Scrivener handles web capture. It's a two-step process, but that has the advantage of causing one to pause and ask 'do I really need this web page?'

Okay. I admit, DevonThink does have a demo/trial version. I'll have to play... C-R-I-M-P-!!!
MadaboutDana 9/3/2014 8:53 pm
Ah, I remember one good reason why I got so annoyed with NoteSuite. A couple, in fact.

a) the search function is weirdly lacking in MacOS: although NS will find web pages, it won't allow you to search for individual words *within* the web page. The iOS version does allow this. Try it – select a web page note, then open the 'Find' menu. You'll see available options only include 'Search Notebook...'; 'Search This Note' is greyed out!

b) NS has a very limited number of export options (plain text or PDF). That's okay, but other solutions have so many more.

c) NS generally does a very poor job of preserving web page formatting (with some surprising exceptions).

d) I have a general impression - perhaps quite unfairly - that the developer has lost interest. Having witnessed the recent resurrection of Notability, this may well be a very unfair perception. But still.
jarechu 9/3/2014 9:42 pm
Just a tip about Notebooks: You can add a bookmarklet to your internet browser to grab web pages directly into Notebooks, it works also on iOS versions.

http://www.notebooksapp.com/ios/import/

MadaboutDana wrote:
Hi folks,

In the knowledge that knowledge management is part of what this forum is
all about, I thought I'd revert to the interesting subject of web pages
and how best to capture/archive them.

All the following thoughts relate to Mac only, I'm afraid. Macs have a
plethora of gorgeous info management apps - the quality and
sophistication is generally much more impressive than most apps I've
encountered on PC. The sharing - not so much, although it's getting
better.

So here, without more ado, are my thoughts on the apps I've been
experimenting with:

▪ Growly Notes ([print to Growly Notes function]: nice, but the
search function is poor - slow - and moving notes around is difficult)
▪ OneNote ([copy to OneNote bookmark] also nice, and search
function is good; moving notes is easy, too, but import is slow, format
is restrictive – graphic only – and despite full-text
indexing, you can’t export or copy the actual text. I’ve
tried the copy-and-paste approach, too – as per Notebooks –
and while it works okay, it’s not optimal because formatting
isn’t preserved well, and URLs aren’t copied over, unlike
the PC version)
▪ Yojimbo ([print to Yojimbo function] powerful, does the
sensible thing and stores web pages as PDF files, but doesn’t
communicate desperately well with its own iOS app and doesn’t
store metadata or even website URLs, unlike OneNote. Search is good,
though)
▪ Together ([print to Together function] more or less identical
to Yojimbo, but with an even flakier iOS client and rather unstable
behaviour; promising, however)
▪ Notebooks ([copy and paste] still one of my favourites, but you
have to manually copy and paste web pages and their URLs. A bit tedious!
Reproduction of web pages is astonishingly good, however)
▪ Stache ([copy to Stache button] very fast and very convenient,
with powerful search function. But needs a lot of memory – can
only really be consulted if it’s the only app running – and
stores web pages as .webarchive files – awkward to open because of
OS X’s restrictions – or screenshots, which are convenient
but limited. Plus very limited output functions - no printing, for
example, which means you can't output the web archives as PDFs - silly!)
▪ Curio ([print to Curio Scrapbook function] haven’t really
experimented yet, but I think it’s limited to PDF – but copy
and paste is almost certainly an option)
▪ DevonThink (nope, haven’t got it - I know, I should, but
it really is expensive!)
▪ Thoughts ([copy and paste] quite nice, with a very good search
function, but not quite interesting enough to compete. Although it might
become so)

And my current favourite:
Scrivener: [Import as web page function] so what’s so good about
Scrivener? Well, it stores pages as web archives, but keeps URLs,
searches through them instantaneously (using full-text indexing) and is
very convenient to use, being the ultimate writing platform, so you can
open web archives in separate windows, or alongside note windows, or
however you like. The only downside is having to copy and paste URLs
into the web page import dialog box, although you could probably
automate that using AppleScript. The other good thing is that you can
print out web archives as PDF files that are identical to what Yojimbo
or Together produce (something Stache doesn’t do), and yet the
whole web page is also stored as a web archive. Now that’s
flexible. And finally, documents collected by Scrivener are indexed by
Spotlight and FoxTrot Pro. Because I use Scrivener more and more as a
research and drafting platform, it's usually open on my desktop in any
case. Downside: no iOS client. Upside: FoxTrot has a very good iOS
client that allows you to search through the text you've indexed on your
Mac (cunning!), so you can store text archived in Scrivener in your
FoxTrot indices!

I'll let you know what I find out about Curio, but I think it'll be
essentially the same as Yojimbo/Together in terms of collection
facilities, with the added flexibility of copy and paste, and Curio's
very powerful search and tagging functions. We shall see!



MadaboutDana 9/4/2014 7:42 am
Thanks, jarechu, that's a useful tip!

Cheers,
Bill
MadaboutDana 9/4/2014 12:45 pm
I have run across an annoying issue with Scrivener, when used heavily for web archiving: it crashes. Disconcerting, when it's normally so stable.

So: DevonThink testing time it is!
Franz Grieser 9/4/2014 1:19 pm
Hi there.

Is there a way to get Evernote notes to Devonthink (apart from copy&paste)?

Thanks, Franz
Paul Korm 9/4/2014 2:20 pm
DEVONthink has a built-in Evernote import command. The notes are imported as "formatted notes" -- which are basically HTML.

Alternative methods are available with third-party scripts -- for example, Justin Lancey's script imports Evernote notes as webarchive documents. Justin keeps his scripts here:

http://veritrope.com/code_type/devonthink/page/2/

Franz Grieser wrote:
Hi there.

Is there a way to get Evernote notes to Devonthink (apart from
copy&paste)?

Thanks, Franz
Franz Grieser 9/4/2014 5:58 pm
That's great. Thanks, Paul.
MadaboutDana 9/4/2014 6:41 pm
Okay, so having tried DEVONthink Pro, I've reverted to DEVONthink Personal, which does most of what I want (for the time being!) and is a bit like Together/Yojimbo/Stache on steroids. It could be more elegant (in particular, the rich-text note facility is neither elegant nor well thought-out), but as an information-capturing app it's fairly awesome.

I'm torn: DEVONthink Pro has some nice extras. But having thought about it, I'm not convinced I need scripting. The RSS support would be nice, but I already use other RSS clients, and DEVONthink Personal can happily import from most web browsers (although not from Opera, unfortunately, which is one of my favourite Mac browsers). You'd also need Pro to handle OPML files, but can I live without this facility? Yes, I can. Well, I think I can.

So it looks like DEVONthink Personal might be enough. But I've still got another 147 hours of testing to go!
Paul Korm 9/4/2014 8:32 pm
DEVONthink does not have a clipper that works in Opera -- but there are bookmarklets available that do most of the heavy lifting and will work in most browsers:

http://www.devontechnologies.com/download/extras-and-manuals.html

I'm a fan of scripting (having written several hundred of these), but through the magic of upgrading you can always decide later if you need the feature. The main drawback to DEVONthink Personal, IMO, is that it is limited to a single database. The Pro and Pro Office versions allow you to create an unlimited number of databases. If you do heavy-duty research or want to segregate work and personal databases, you would want to consider that feature. OPML import to DEVONthink might not be what you expect -- DEVONthink assumes every tier in an OPML file is a discrete "group" (folder) and will create a hierarchy of groups when it imports OPML.
MadaboutDana 9/4/2014 8:46 pm
You magic bunny! I knew I'd seen one somewhere, but couldn't for the life of me remember where.

The point about multiple databases is a good one. Oh dear...