Bug#885185: Can't get gscan2pdf to work properly (i.e. give even half-decent output)

Jeff Thu, 04 Jan 2018 09:04:00 -0800

On 25/12/17 22:04, Leigh Collier wrote:
> Version: 1.2.3

You will find the latest version packages in my PPA with many
improvements. This might solve the problem of lack of output for
Tesseract or Cuneiform.


> I opened a jpg file in gscan2pdf and selected the area I wanted OCR'd by
> using the rectangular selection tool (guessing how to use it, as there
> didn't seem to be any reference to this tool in the Help). This gave me
> (under the Image tab) the image shown in the attachment gscan2pdf 1 of 2.

gscan2pdf doesn't currently support running OCR on a portion of an image
- always on the whole image.

> As you can see, the purported OCR version of the selected image is
> extremely disappointing and for all practical purposes useless. It has
> attempted (very badly) to OCR text in the image outside the area
> indicated by the rectangular selection tool (when I had used to tool to
> deliberately exclude that part of the original image) and completely
> failed to give any OCR output for the majority of the text in the image
> inside the area indicated by the rectangular selection tool.

The results from GOCR tend only to be reasonable for good scans, which
yours really isn't.

> I can't see what I have done wrong, nor can I see what adjustments I can
> make to what I asked it to do. It seems obvious to me that to what I
> asked it to do is exactly the kind of thing you would expect an OCR
> program to do, so where have I gone wrong and how can I make any changes
> to achieve a better outcome? Isn't what I did pretty much what any user
> would do?

You might find that the "Clean up" (unpaper) tool helps. However, it is
aimed more at images that are skewed. Yours seems to be a photo from a
book, and the skew angle is different left and right. That might be
difficult to correct.

> When I told gscan2pdf to use Tesseract and Cuneiform instead of GOCR, I
> got no output at all - completely blank.

Please start gscan2pdf from the command line, import the problem image,
try to use Tesseract and Cuneiform, quit, and post the log file. Both
should work.

> From what I have read on the internet, having done a bit of searching to
> see what I can find out about gscan2pdf, it seems that it and the search
> engines it uses are serious, genuine items of software that actually do
> a job of OCRing images of text into actual text. So I am puzzled why I
> have encountered such complete failure when I tried to use the software.

Absolutely. Both should just work.

> I found the Help documentation to be pretty skimpy, unhelpful to people
> who don't yet know how to use the program (i.e. Unhelpful to the very
> people who need its help).

I am pretty much the only developer. Other give me patches to fix
problems when they find them. If you would like to help improve the
documentation, please do. My problem is time, and I tend to concentrate
on fixing bugs and introducing new features. My currently problem is
migrating from gtk+-2 to gtk+-3.

> When I looked up reporting a bug, I got all this stuff about Debian
> apparently addressed to people who know all about using command lines.
> All of that I found completely incomprehensible. As a user I just want
> to use the program; I am using a GUI, I don't want to know anything
> about doing things the command line way. How is a normal user expected
> to report a bug or ask a question? All this stuff gives the appearance
> of insiders talking to themselves, and that outsiders are supposed to
> somehow magically know how things work before they've asked the question.

Sure. Perhaps I should point people at the reportbug utility, which
makes it easier to report bugs in Debian packages. I also agree that the
only way of creating a log file currently is via the command line. I
could probably improve that.

signature.asc
Description: OpenPGP digital signature

Bug#885185: Can't get gscan2pdf to work properly (i.e. give even half-decent output)

Reply via email to