PDF cataloging
< Next Topic | Back to topic list | Previous Topic >
Posted by Graham Rhind
Aug 18, 2016 at 08:06 AM
Hi all,
We have a large number of (mainly) large pdf files containing a lot of disparate information which currently nobody is reading and where nobody knows where to look to get important information.
We’re looking for a system where we can easily catalogue what is in each pdf and allow searching within the index and the pdfs, and (ideally) allows people to access directly that part of the relevant pdf which contains the information they have found in the catalogue. So, a sort of library system that’s easy to use. It needs to be centrally accessible for a team, based on html (locally) or cross-platform and trivially priced.
Have you good people any ideas on this one?
Thanks,
Graham
Posted by Slartibartfarst
Aug 18, 2016 at 09:35 AM
Well, quite some time ago, one of my clients had a similar problem and we addressed it satisfactorily with Qiqqa: http://www.outlinersoftware.com/topics/reply/6596/25920
Hard to beat.
Posted by Graham Rhind
Aug 18, 2016 at 12:18 PM
That link doesn’t take me to a topic thread, but I’ll check qiqqa out - thanks.
Posted by MadaboutDana
Aug 18, 2016 at 04:33 PM
The cheapest option would be to:
- set up a network file share
- store your PDFs in a carefully structured folder hierarchy (the “index”)
- use Adobe Reader to carry out Advanced Searches either on the entire folder tree, or on individual folders (and subfolders if you wish)
The cost of that approach would be minimal, while giving you direct access to specific sentences containing the stuff you’re looking for (Adobe Reader’s Advanced Search is very powerful, providing filename, context, search term highlighting etc.).
If you wanted to speed it up, you could invest in a full version of Adobe Acrobat and generate proper indexes for the PDFs. Adobe Reader can (I believe; at least, it used to be able to) then use this index to run Advanced Searches.
An alternative would be to set up a Mac Mini with DEVONthink on it, and use the latter’s web server to publish your PDFs to your local network. But the web server’s search function is rudimentary compared to the desktop version of DEVONthink.
Posted by Graham Rhind
Aug 18, 2016 at 05:01 PM
Thanks Madaboutdana.
What I don’t see Reader helping with (I haven’t looked at Qiqqa yet) is the metadata issue.
If I have a dozen pdfs each with 1000 pages, results of searches for almost anything are going to be full of noise. I want a solution that starts off being useful enough for people to use. So I want the pdfs themselves to be accompanied by a searchable metadata catalogue.
Let’s say there’s an annual sales documentation and between pages 230 and 240 there’s a section related to widget sales in Antarctica in 1989. I would like that information to get stored in a catalogue which would then allow people to search, for example, for widget sales figures for Antarctica but not South Africa in 1989, and this to bring up the “index card” with a link to that section of the pdf, wherever it may be. If I were just to search all the pdfs for “Antarctica and 1989” there would probably be too many results in the collection of pdfs, each which would need checking to ascertain its relevance, and would put people off using the system.
This is basic database stuff which I can do, but only if each pdf gets read (or at least skimmed) and the relevant information entered into a catalogue.
I wondered if there was an easier way.