Package: pdf2djvu
Version: 0.9.18.2-2+b2
Severity: wishlist
Tags: upstream
X-Debbugs-Cc: debbug.pdf2d...@sideload.33mail.com

If a doc is scanned then unpaper is used to produce a bilevel PBM
file, which is then converted to PNG and embedded as-is without
manipulation into a PDF, the result is a relatively small PDF. Then
ocrmypdf is used to embed text.

When the output PDF from the above operations is fed to pdf2djvu, the
resulting DjVu file is much _bigger_ than the PDF. This mostly defeats
the point of the DjVu format.

There must be a deficiency in pdf2djvu to cause that. When the PBM
file is directly fed to cjb2, the resulting djvu file is rightfully
_smaller_ than the PDF was. One could use cjb2 as a workaround, but
the problem is the PDF middle step is useful for OCR ops.

It’s worth noting that this project may have fallen out of maintenance:

  https://github.com/jwilk-archive/pdf2djvu/issues/157

The man page still points to that bug tracker and the mailing list
(which also became inactive in 2022).

Workaround (untested and complex!):

A workaround that would result in text would normally involve using
ocrodjvu, but that package has been killed off. It’s unclear if
“djvused --save-script” can somehow be used. Perhaps it can be run on
the fat OCRd djvu file, then the output could perhaps be injected into
the lean file from cjb2 using “djvused -f”.

-- System Information:
Debian Release: 12.6
  APT prefers stable-updates
  APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 
'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages pdf2djvu depends on:
ii  djvulibre-bin               3.5.28-2+b1
ii  libc6                       2.36-9+deb12u7
ii  libdjvulibre21              3.5.28-2+b1
ii  libexiv2-27                 0.27.6-1
ii  libgcc-s1                   12.2.0-14
ii  libgomp1                    12.2.0-14
ii  libgraphicsmagick++-q16-12  1.4+really1.3.40-4
ii  libgraphicsmagick-q16-3     1.4+really1.3.40-4
ii  libpoppler126               22.12.0-2+b1
ii  libstdc++6                  12.2.0-14
ii  libuuid1                    2.38.1-5+deb12u1

pdf2djvu recommends no packages.

Versions of packages pdf2djvu suggests:
ii  poppler-data  0.4.12-1

-- no debconf information

Reply via email to