PDF File Mgmt. Question

Started by Ken on 2/28/2021
Ken 2/28/2021 7:33 pm
I know this question is a bit OT, but this is the one group that I know of that has many members who work with large numbers of PDF files, so I thought I would ask the hive for some advice. I just bought a Brother document scanner so I can scan a number of old statements and get rid of the paper copies that have been taking up file cabinet space. Initially, I thought that flat scans (i.e. no OCR conversion) would suffice, and that may be the case, but the software that Brother includes with the scanner has left me a bit stumped on the best way to handle the scans since it will involve many unrelated accounts (e.g. banking, medical, utilities, etc.).

Brother provided me with copies of Kofax Paper Port and Power PDF along with a basic utility to handle settings and machine selection. Setting up a folder structure to hold the files is easy enough, but what I am trying to figure out is the best way to either incorporate either or both of these pieces of software (or neither), or see if there are better solutions that I should consider at a reasonable price. Paper Port seems useful for assembling pages into files and possibly for file naming, but there are few reviews about it. Power PDF seems more like an Acrobat competitor, and I suspect I will not have have as much use for it, but I am not clear which program I would use if I do decide that some documents need OCR conversion.

Any wisdom would be greatly appreciated. And does anybody have any thoughts about scanning these documents at 200PPI vs. 300PPI? I normally scan photos and that is a whole other ball of wax for scan settings (on a separate, dedicated scanner). I mainly am trying to balance file size/scan speed with readability and print quality (if printing is needed).

Thanks,

--Ken
Lb 2/28/2021 10:44 pm
From experimentation I found that best scanning resolution for me is 300dpi for a more accurate OCR reading. File size is not overly large and makes a clean PDF or image file.

I can't comment on your software since I don't use the ones you're using. I've tried many and have found that the OCR interpretation varies a lot between them.

I make all my PDF's "Searchable" with the OCR. It takes a little longer to scan each one but worth it if you need to do a search within the files.

Best of luck,
Larry
Ken 3/1/2021 4:37 am
LB wrote:
From experimentation I found that best scanning resolution for me is
300dpi for a more accurate OCR reading. File size is not overly large
and makes a clean PDF or image file.

I can't comment on your software since I don't use the ones you're
using. I've tried many and have found that the OCR interpretation
varies a lot between them.

I make all my PDF's "Searchable" with the OCR. It takes a little longer
to scan each one but worth it if you need to do a search within the
files.

Thank you for this information, Larry. I had not thought about the quality of the scan in terms of resolution for OCR conversion, so 300ppi makes sense.

--Ken

Best of luck,
Larry
JDS 3/1/2021 5:24 pm
I am not sure if this meets your needs, but look at DropIt. It is a utility that can monitor folders and rename and move the files based on rules and actions you set. For example, you can scan in a tax document, and DropIt can rename it to ''Tax Form'' and move it to a folder ''Taxes 2020''. The pdf needs to have been through OCR, of course.
Ken 3/1/2021 5:49 pm
JDS wrote:
I am not sure if this meets your needs, but look at DropIt. It is a
utility that can monitor folders and rename and move the files based on
rules and actions you set. For example, you can scan in a tax document,
and DropIt can rename it to ''Tax Form'' and move it to a folder ''Taxes
2020''. The pdf needs to have been through OCR, of course.

Thank you for this recommendation. I have never heard of this program before, but it looks quite interesting. Have you used it?

--Ken
MadaboutDana 3/1/2021 6:07 pm
Before switching to Mac, I used PaperPort in anger (while it still belonged to, I think, Xerox?) and found it pretty good.
JDS 3/1/2021 7:30 pm
I have. It takes a little time to set it up, but saves a lot of drudgery once you have. I have had no problems with it.
Jon Polish 3/1/2021 8:40 pm
Both are good products for the individual user. Kofax acquired PowerPDF from Nuance and with the exception of portfolios, I have always been very satisfied. Porfoilios stopped working with the demise of Flash Payer (a good thing?) but not too many use that feature.

I see Kofax now has PaperPort. I have not used PaperPort for a while so I did not know that. I imagine that it, like PowerPDF, is unchanged.

Jon
Ken 3/1/2021 9:13 pm
JDS wrote:
I have. It takes a little time to set it up, but saves a lot of drudgery
once you have. I have had no problems with it.

I need to dig a bit deeper into the program. It looks like it offer a lot of features, but I suspect one has to understand what they need accomplished, and how they want it done, before trying to set it up. Not much info. on the originally developer, but that is okay. I am always curious about program developers when I see software from small shops or individuals.

--Ken
Ken 3/1/2021 9:17 pm
Jon Polish wrote:
Both are good products for the individual user. Kofax acquired PowerPDF
from Nuance and with the exception of portfolios, I have always been
very satisfied. Porfoilios stopped working with the demise of Flash
Payer (a good thing?) but not too many use that feature.

I see Kofax now has PaperPort. I have not used PaperPort for a while so
I did not know that. I imagine that it, like PowerPDF, is unchanged.

Jon

Yes, it seems that both were acquired by Kofax. And neither looks to have received a major update in some time. That is fine if they work as intended, and as neither seems to lock you in their system, I am feeling a bit more comfortable about starting out with them. I suspect that Paper Port will be more useful to me for the task at hand, but it is nice to know that I have access to Power PDF if needed. While I have an Adobe Creative Cloud subscription for Lightroom, and have Acrobat Pro at work, I only have Acrobat reader at home.

--Ken
James Thompson 3/1/2021 9:37 pm
I concur with the general response you've received. Paperport is the perennial standard; haven't seen a new competitor of note in recent years though I haven't been looking, but it's probably good enough. Especially if it comes with your OCR software.

300 DPI is the bare minimum scan density you should set for any kind of acceptable OCR read. 600 DPI probably not necessary, and with some dirty scans 300 can actually clean things up a bit. If you're scanning a poor-quality print yield will be low anyhow but extraneous smudges or non-intentional "ink" on the page can confuse the OCR software and sometimes I've found 300 DPI will reject the flotsam.
Ken 3/1/2021 11:15 pm
James Thompson wrote:
I concur with the general response you've received. Paperport is the
perennial standard; haven't seen a new competitor of note in recent
years though I haven't been looking, but it's probably good enough.
Especially if it comes with your OCR software.

300 DPI is the bare minimum scan density you should set for any kind of
acceptable OCR read. 600 DPI probably not necessary, and with some dirty
scans 300 can actually clean things up a bit. If you're scanning a
poor-quality print yield will be low anyhow but extraneous smudges or
non-intentional "ink" on the page can confuse the OCR software and
sometimes I've found 300 DPI will reject the flotsam.

This sounds good. Thanks for confirming.

--Ken
Leib Moscovitz 3/2/2021 5:54 pm
With regard to scanning and manipulating PDFs in almost every way imaginable, I very highly recommend PDF-XChange (Tracker Software); they have multiple editions, offering a very wide variety of features at different prices, which address the needs of varied types of users. I might add that their service is splendid, and their software (in all editions) is very reasonably priced.

I don't know about their OCR, as the edition I use has limited OCR functionality, although they apparently improved this functionality significantly in the newest version. However, for really complex OCR, I recommend very highly ABBYY reader, which, judging from my experience and that of some of my colleagues who deal with such things in extension, leaves many if not all competitors in the dust.

Finally with regard to organizing your PDF files, a trick which I found extremely helpful - assuming, of course, that you organize the files in regular directories and subdirectories rather than, say, importing them into a program like UltraRecall - is to use the free four-quadrant file manage Qdir. Each quadrant can contain multiple tabs for difference directories, and you can, in effect, organize quadrants by "themes" - say, one quadrant for directories of personal files, another for academic files, another for financial files, etc. Thus, if you scan by default to your desktop, it is very easy to drag and drop the resulting PDFs into appropriate predefined locations by using Qdir's quadrants as described above.
Ken 3/2/2021 10:45 pm
Leib Moscovitz wrote:
With regard to scanning and manipulating PDFs in almost every way
imaginable, I very highly recommend PDF-XChange (Tracker Software); they
have multiple editions, offering a very wide variety of features at
different prices, which address the needs of varied types of users. I
might add that their service is splendid, and their software (in all
editions) is very reasonably priced.

I don't know about their OCR, as the edition I use has limited OCR
functionality, although they apparently improved this functionality
significantly in the newest version. However, for really complex OCR, I
recommend very highly ABBYY reader, which, judging from my experience
and that of some of my colleagues who deal with such things in
extension, leaves many if not all competitors in the dust.

Finally with regard to organizing your PDF files, a trick which I found
extremely helpful - assuming, of course, that you organize the files in
regular directories and subdirectories rather than, say, importing them
into a program like UltraRecall - is to use the free four-quadrant file
manage Qdir. Each quadrant can contain multiple tabs for difference
directories, and you can, in effect, organize quadrants by "themes" -
say, one quadrant for directories of personal files, another for
academic files, another for financial files, etc. Thus, if you scan by
default to your desktop, it is very easy to drag and drop the resulting
PDFs into appropriate predefined locations by using Qdir's quadrants as
described above.

Thanks for the additional recommendations. I am familiar with Free Commander, but have not heard of Qdir. Sounds interesting.

--Ken
MadaboutDana 3/3/2021 8:57 am
Totally agree on (a) 300 dpi and (b) ABBYY (the Mac version of which is also awesome and embedded in high-quality apps like DEVONthink Office)
Ken 3/3/2021 3:32 pm
MadaboutDana wrote:
Totally agree on (a) 300 dpi and (b) ABBYY (the Mac version of which is
also awesome and embedded in high-quality apps like DEVONthink Office)

I believe that Epson may include a version of ABBYY with their scanners. If I end up doing a lot of OCR conversion, I will look into it.

Thanks,

--Ken
Nomatica 5/27/2021 4:22 pm
I am looking at scanners, specifically the Fujitsu and I was looking for insight from people who may have used their systems or similar systems.

Fujitsu recently changed the software that shipped with their consumer systems. It is now SnapScan Home, where before it was SnapScan Manager.

What I like about it is that it can use ocr to automatically name the files, pull contact information from business cards, receipts etc. In fact it even has the ability to automate expense reports. If I understand correctly, you have a choice where to scan. Either to the Snapscan directory, where items are handled automatically, there is a unified search and tagging OR you can scan to user defined folder. I saw one user describe the Snapscan directory as Devonthink-like.

I am curious how it compares with other scanners on the market and their bundled software. Here is the video: https://www.youtube.com/watch?v=tKukRMxPdmk


Ideally, I would like to be able to automatically name, file and tag a pdf using text in the PDF. Years ago there were ways to define fields based on the zones of at pdf (zonal OCR). But I am not seeing that as an option. However, Snap Scan Home does something similar. It just does not look like I have much control over it.

I am also interested in being able to pull table/spreadsheet data from PDFs. I have used Omnipage in the past for this... and while it can be faster than cleaning up a converted excel spreadsheet, it does seem like there should be a better way.

I am on Windows and prefer local first...and being able to choose my own cloud service for sharing.

Anyone using Scansnap software or other software?