How do you mark the internet as "finished"?
< Next Topic | Back to topic list | Previous Topic >
Posted by Gary Carson
Sep 29, 2013 at 03:31 PM
I’m not sure I understand what the problem is here.
What difference does it make if you capture the same article twice? You can sort out the duplicates when you start organizing all the material.
Posted by Wayne K
Sep 29, 2013 at 04:37 PM
Gary,
Nothing earth-shattering. It’s just a waste of time to download the article again, figure out that it’s a duplicate, and delete it. Often I change the name of the file when I download it from whatever default comes up. The next time I download the same article I might give it a slightly different name, so then I have open both files to make sure they’re duplicates.
It’d also be nice to immediately see that I’ve visited a website and captured the relevant information without having to re-read the website, re-read the articles, and re-evaluate the articles to see which ones I want to download. That’s a significant amount of time and I’ve gone through this kind of duplicate effort a number of times when I re-visit a website I haven’t seen in a number of years.
Posted by Dr Andus
Sep 29, 2013 at 04:52 PM
Wayne K wrote:
> It’s just a waste of time to download the
>article again, figure out that it’s a duplicate, and delete it. Often I
>change the name of the file when I download it from whatever default
>comes up. The next time I download the same article I might give it a
>slightly different name, so then I have open both files to make sure
>they’re duplicates.
How about using some web capture software instead like Surfulater? Then you wouldn’t need to go through the PDF production process, and it’s very quick to capture a page (right-click, “add new article to Surfulater/or whatever” and that’s it).
You can organise the pages into folders, and then it’s a lot easier to see whether there is any duplication in the folder. Plus there is the Filter tool in SF that filters the captured content on the basis of the title of the web page, again, to see duplication.
Though I imagine you must have your reasons why you want them as PDFs.
Posted by Wayne K
Sep 29, 2013 at 05:54 PM
Dr Andus wrote:
Wayne K wrote:
>> It’s just a waste of time to download the
>>article again, figure out that it’s a duplicate, and delete it. Often
>I
>>change the name of the file when I download it from whatever default
>>comes up. The next time I download the same article I might give it a
>>slightly different name, so then I have open both files to make sure
>>they’re duplicates.
>
>How about using some web capture software instead like Surfulater? Then
>you wouldn’t need to go through the PDF production process, and it’s
>very quick to capture a page (right-click, “add new article to
>Surfulater/or whatever” and that’s it).
>
>You can organise the pages into folders, and then it’s a lot easier to
>see whether there is any duplication in the folder. Plus there is the
>Filter tool in SF that filters the captured content on the basis of the
>title of the web page, again, to see duplication.
>
>Though I imagine you must have your reasons why you want them as PDFs.
I tried web capture software a couple of years ago. The stopping point for me was the poor mark-up capabilities. Surfulator couldn’t even highlight text that you’ve captured (I confirmed that with their tech support - maybe it’s changed since then). By capturing them as PDF’s, I can take advantage of software that can do any kind of markup I can imagine. I use PDF Revu but I know there are other excellent choices for pdf markups.
I also like the idea of staying relatively software-neutral with a file format that’s likely to be around longer than I am.
As for seeing duplicates in folders, I can do the same thing with pdf files. It’s just that it slows down the research and capture process. I like having some automatic organization on the front-end to save time on the back-end.
Posted by Alexander Deliyannis
Sep 29, 2013 at 06:46 PM
Wayne K wrote:
>I haven’t found a Chrome add-on that does a similar job with pdf’s.
>If I could find one, that would solve the problem
I don’t have something to propose if you want the PDF page to be laid out as it is the original website, but if content is more important than layout, you might want to try out Cleanprint https://chrome.google.com/webstore/detail/print-or-pdf-with-cleanpr/fklmmmdcofimkjmfjdnobmmgmefbapkf
You may already try it out in Firefox as it is included in the extensions available http://www.formatdynamics.com/bookmarklets/
For precise capture of the layout I use the pro version of Fireshot, also supporting several browsers (I believe that such tools are more likely to be around longer). However, there’s a catch: the capture is in image form even if you choose to save it as PDF. So you can’t search or copy the textual content unless you OCR it first. Fireshot’s strong point is the ability to capture full web pages (including non visible parts).