getting text from book into a notetaking app
< Next Topic | Back to topic list | Previous Topic >
Posted by jimspoon
Sep 12, 2013 at 09:06 PM
Hi all ...
I was very interested in a post that WSP made in the AskSam thread ... just will paste in what he said here ...
———-
Most importantly—to me—Evernote does a quick, competent OCR of any text in images. As a result, I do far less typing that I used to at the note-taking stage. When I am reading a printed book or periodical (I’m not talking about digital versions here) and I come upon a passage that interests me, I photograph the page with my iPhone and pop it into Evernote. Within a few minutes the text in that image becomes fully searchable. Much nicer than typing!
It’s possible, incidentally, to take the picture with the “Page” camera option in Evernote (iOS versions), but I prefer to use an iPhone app called CamScanner, because its excellent “Magic Color” enhances the image so nicely. CamScanner then allows me to send the photo (or photos) directly to Evernote if I wish. Earlier this evening, just as an experiment, I photographed a page of a book I was reading—in the living room, under not especially good light—that was published in 1722. I sent it to Evernote via CamScanner, all quite effortlessly, and the image was then easily searchable in EN. If I wished, I could crop the image further in Skitch (from within EN), highlight certain sentences, and put various marks in the margin.
———-
I just looked and I see that there is a CamScanner app for Android, so I will be sure to try this.
The thing about taking photos of text in a book is - there is a lot of skew in the image. This is because of the angle at which you take the picture of the printed page, and also because of the curvature of the page as it goes into the spine of the book.
So far I have been working with Omnipage OCR. It does have a “3D Deskew” function that does a pretty good job of de-skewing the image for OCR purposes.
When I scan a document into Paperport, I save it to the “Searchable Image” PDF format. I always want to be able to see the original image, because of the inevitable text recognition errors. It’s been a while since I’ve tried, but so far I didn’t find a way to get Omnipage to save a scanned document so that I could see the original image of the text cropped (to show only the page image) and deskewed.- essentially I would see the same kind of page image that I would see if I took the page out of the book, scanned with a sheet-fed scanner, and saved it to Searchable Image PDF. That would be my goal anyway.
would be interested in hearing others’ experience - I am encouraged by Bill’s info.
jim
Posted by WSP
Sep 12, 2013 at 10:35 PM
Jim,
What I failed to mention is that CamScanner automatically compensates for skew. There are a number of apps similar to CamScanner (I used JotNot, for example, for several months last year), and they’re all reasonably good. The reason I finally settled on CamScanner is that its image enhancement seemed to me the best.
I should also emphasize that I use Evernote’s OCR for text recognition in images. Several of these phone scanner apps try to do OCR on their own, but they’re not very good at it. If I am producing a PDF file rather a single image in CanScammer, I afterwards run that file through PDF-Xchange on my computer in order to do OCR on it.
If I were not sending images into Evernote, I would be tempted to use the app called TextGrabber (by ABBYY) on my phone. It does fairly good text recognition, though it seems to be fussy about light and camera angle.
Bill
Posted by dan7000
Sep 12, 2013 at 11:46 PM
I do quite a bit of this too. I previously used a 3d party app to do the de-skewing but now I find that Evernote’s page scanner does a decent enough job and it makes the process super fast. For instance, I occasionally have to use paper library books for research and I just quickly photograph the pages I care about and when I get back to my desk they are sitting there in Evernote in Windows waiting for me - it’s awesome.
But one other method I use a lot is screenshots from Kindle or Google Reader. Generally if there’s a book I will want to copy a lot of stuff from, I get the Kindle / eBook version. Then when I find a page I want to copy, I do a screenshot on my iPad (or on my Android tablet I used to have a 3d party app for this). Then paste into Evernote. You get a cleaner, easier-to-read version of the page than you do with a photo or scan. Unfortunately, it’s slightly slower than a photo because you have to copy, switch apps, new note, insert picture. And you don’t get an accurate page number with Kindle. Believe it or not, I often have both the paper book and the Kindle book, and I note the real page number in the Evernote note after inserting the Kindle screenshot.
Posted by WSP
Sep 13, 2013 at 01:21 AM
I just did a quick little experiment. I photographed the same page (an old book, poorly printed) in Evernote’s Page Camera and in CamScanner; then I allowed Evernote enough time to recognize the text in the two images. When I did a series of word searches, I found the recognition was somewhat more accurate in the CamScanner page; in other words, I searched for quite a few words, and in several instances I was able to find those particular words on the CamScanner page and not on the Page Camera page.
The differences were not dramatic, and I agree with you that Page Camera is really quick and easy to use, but this seems to confirm my earlier subjective impression that CamScanner produces somewhat better search results. Of course if I had been photographing a more crisply printed book in better light, the differences between the two images might have disappeared.
Bill
Posted by Alexander Deliyannis
Sep 13, 2013 at 08:08 PM
dan7000 wrote:
>Then when I find a page I want
>to copy, I do a screenshot on my iPad (or on my Android tablet I used to
>have a 3d party app for this).
May I ask what that 3rd party Android app was?