Package: gscan2pdf
Version: 1.2.7-1
Severity: minor
Tags: upstream patch

please have a look at the patch series against gscan2pdf's master branch in
  https://github.com/marschap/gscan2pdf/commits/canvas-improvements

It addresses the following issues I found with OCR'ed text:
* scale & shift the text elements so that they fill their bounding boxes better
* support of rotated text elements (again with proper scaling and translations)
* re-factor Gscan2pdf::Page->boxes() to return a list of hashes with hash
  keys being a subset of properties from the hOCR elements.
* add the additional properties to the Goo::Canvas::Group objects in 
boxed_text()
* use these additional properties in canvas2hocr() to provide more
  information and to be more "round-trip safe".

In my setup (KDE as desktop environment, tesseract as OCR engine) this makes
the OCR texts readable i(instead of being illegible dark specks) and resemble
the original text a lot better.

I tested it with gocr and tesseract in multiple local documents, and adapted
the respective tests in the test suite too.

More detailed explanations in the commit messages.

It would be cool if these changes would make it into the next gscan2pdf release.

Thanks for this cool piece of software
Peter


-- System Information:
Debian Release: 8.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages gscan2pdf depends on:
ii  imagemagick                        8:6.8.9.9-3
ii  libconfig-general-perl             2.56-1
ii  libgoo-canvas-perl                 0.06-2+b1
ii  libgtk2-ex-simple-list-perl        0.50-2
ii  libgtk2-imageview-perl             0.05-2+b1
ii  libhtml-parser-perl                3.71-1+b3
ii  libimage-magick-perl [perlmagick]  8:6.8.9.9-3
ii  liblist-moreutils-perl             0.33-2+b1
ii  liblocale-gettext-perl             1.05-8+b1
ii  liblog-log4perl-perl               1.44-1
ii  libpdf-api2-perl                   2.023-1
ii  libproc-processtable-perl          0.51-1
ii  libreadonly-perl                   2.000-1
ii  librsvg2-common                    2.40.5-1
ii  libsane-perl                       0.05-2+b2
ii  libset-intspan-perl                1.19-1
ii  libtiff-tools                      4.0.3-10+b4
ii  libtry-tiny-perl                   0.22-1
ii  sane-utils                         1.0.24-7

Versions of packages gscan2pdf recommends:
ii  djvulibre-bin              3.5.25.4-4+b1
ii  gocr                       0.49-2
ii  libgtk2-ex-podviewer-perl  0.18-1
ii  sane                       1.0.14-9
ii  tesseract-ocr              3.03.03-1
ii  unpaper                    0.4.2-1
ii  xdg-utils                  1.1.0~rc1+git20111210-7.1

gscan2pdf suggests no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to