Searching scanned PDFs
< Next Message | Back to archived message list | Previous Message >
Note: This message is from the outliners.com archive kindly provided by Dave Winer.
Outliners.com Message ID: 3723
Posted by graham.smith
2005-08-10 13:03:10
Elsewhere I have been discussing the problems of searching documents that have been scanned into PDFs. This means they aren’t text based PDFs but image based PDFs so they cannot be searched by text based search engines like DTSearch, Copernic etc.
My solution is to rather tediously OCR them into txt files,and then link the txt file to the PDF file using Zoot.
However at http://www.lvpaperless.com/i-pi.htm we have a program that claims to search on the basis of word “shapes” and will search scanned PDFs that haven’t been OCRd.
However at $1000 you would need to be doing a lot of this sort of thing to make it worth the money !!!
Graham