Package: gscan2pdf Version: 1.2.7-1 Severity: minor Tags: upstream patch please have a look at the patch series against gscan2pdf's master branch in https://github.com/marschap/gscan2pdf/commits/canvas-improvements
It addresses the following issues I found with OCR'ed text: * scale & shift the text elements so that they fill their bounding boxes better * support of rotated text elements (again with proper scaling and translations) * re-factor Gscan2pdf::Page->boxes() to return a list of hashes with hash keys being a subset of properties from the hOCR elements. * add the additional properties to the Goo::Canvas::Group objects in boxed_text() * use these additional properties in canvas2hocr() to provide more information and to be more "round-trip safe". In my setup (KDE as desktop environment, tesseract as OCR engine) this makes the OCR texts readable i(instead of being illegible dark specks) and resemble the original text a lot better. I tested it with gocr and tesseract in multiple local documents, and adapted the respective tests in the test suite too. More detailed explanations in the commit messages. It would be cool if these changes would make it into the next gscan2pdf release. Thanks for this cool piece of software Peter -- System Information: Debian Release: 8.0 APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages gscan2pdf depends on: ii imagemagick 8:6.8.9.9-3 ii libconfig-general-perl 2.56-1 ii libgoo-canvas-perl 0.06-2+b1 ii libgtk2-ex-simple-list-perl 0.50-2 ii libgtk2-imageview-perl 0.05-2+b1 ii libhtml-parser-perl 3.71-1+b3 ii libimage-magick-perl [perlmagick] 8:6.8.9.9-3 ii liblist-moreutils-perl 0.33-2+b1 ii liblocale-gettext-perl 1.05-8+b1 ii liblog-log4perl-perl 1.44-1 ii libpdf-api2-perl 2.023-1 ii libproc-processtable-perl 0.51-1 ii libreadonly-perl 2.000-1 ii librsvg2-common 2.40.5-1 ii libsane-perl 0.05-2+b2 ii libset-intspan-perl 1.19-1 ii libtiff-tools 4.0.3-10+b4 ii libtry-tiny-perl 0.22-1 ii sane-utils 1.0.24-7 Versions of packages gscan2pdf recommends: ii djvulibre-bin 3.5.25.4-4+b1 ii gocr 0.49-2 ii libgtk2-ex-podviewer-perl 0.18-1 ii sane 1.0.14-9 ii tesseract-ocr 3.03.03-1 ii unpaper 0.4.2-1 ii xdg-utils 1.1.0~rc1+git20111210-7.1 gscan2pdf suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org