Searching scanned PDFs

Posted by graham.smith on 8/10/2005
graham.smith 8/10/2005 1:03 pm
Elsewhere I have been discussing the problems of searching documents that have been scanned into PDFs. This means they aren't text based PDFs but image based PDFs so they cannot be searched by text based search engines like DTSearch, Copernic etc.

My solution is to rather tediously OCR them into txt files,and then link the txt file to the PDF file using Zoot.

However at http://www.lvpaperless.com/i-pi.htm we have a program that claims to search on the basis of word "shapes" and will search scanned PDFs that haven't been OCRd.

However at $1000 you would need to be doing a lot of this sort of thing to make it worth the money !!!

Graham
sub 8/10/2005 1:52 pm
My solution is to rather tediously OCR them into txt files,and then link the txt file to the PDF file using Zoot.

Graham,

What software do you use for OCR? Are you happy with it?

alx