On 03-10, Ray Olszewski wrote:
> At 08:59 AM 3/10/2004 -0500, Hal MacArgle wrote:
> [...]
> >> 1. Run a program that will scan to an image file, then a separate program
> >> that will do OCR on the scanned image.
> >
> > Done - scan to PBM (P4) then run Gocr to get a bit mapped
> >file.. Trouble is; even using resolution 360, slow and a big PBM
> >file, the final copy is about 90% accurate and the original format of
> >the letter compromized losing paragraphs, offsets, indents, etc...
> >Taking the time to do all this plus fix up the immediate above would
> >take almost as long as manually re-typing the page..
>
> Yes, this is always the problem with OCR. Back when I did a lot of it
> (about 12-15 years ago, on a Macintosh), OCR packages included the "brag"
> that they were 99% accurate. I, like any serious user, had no trouble
> translating "99% accurate" to "an error every 3 lines of text" and was
> unimpressed. For everyday use, OCR needs about "four 9s" of accuracy,
> translating to one error every few pages of text.
>
> If your 90% estimate is correct, it translates to (on average) several
> errors per line of text, making the process close to worthless for you.
>
> Of course, any OCR package is better on some images than others. I don't
> know what your source pages look like. For example, serif fonts (e.g.,
> Times Roman, Century Schoolbook, Palatino) are generally easier to OCR well
> than sanserif fonts (e.g., Ariel, anything with "sans" in the name). Fresh
> printouts are better then third-generation Xeroxes. And so on.
>
> I expect that you are at the point where you need help from someone with
> real and current expertise in OCR work, preferably on Linux. That's not me,
> and from the surrounding silence, I suspect it is not to be found on this
> list.
>
Greetings: And your detailled comments most valuable as
usual.. It stands to reason the fonts must "match" the design.. I
just looked at my cheque book and the "crazy" numeral "style." I read
further that OCR's, even the pricey ones, have a real problem when
the text is in italics after a long run of "normal." The banking
system had better "match" eh??
> > I fetched Clara but could only find a .rpm file, no tarball
> >could for Slackware, etc.. Slack has a rpm program but the
> >dependencies needed to extract looked like they were mostly Red Hat's
> >filenames..
>
> Well ... a source .tgz can be downloaded from a link on this page --
> http://www.claraocr.org/
>
> (Thank you, Google).
>
I, of course, used google/linux but entered ocr and got the
"wrong" Clara site.. Thanks. I will try the tar ball but am
pessimistic of course.. Methinks Patrick at Slackware doesn't include
ocr in his packages for good reason...
Appreciate!!
Hal - in Terra Alta, WV - Slackware GNU/Linux 9.0 (2.4.20)
Utrum Per Hebdomadem Perveniam
.
-
To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.linux-learn.org/faqs