Package: pdf2djvu Version: 0.9.18.2-2+b2 Severity: wishlist Tags: upstream X-Debbugs-Cc: debbug.pdf2d...@sideload.33mail.com
If a doc is scanned then unpaper is used to produce a bilevel PBM file, which is then converted to PNG and embedded as-is without manipulation into a PDF, the result is a relatively small PDF. Then ocrmypdf is used to embed text. When the output PDF from the above operations is fed to pdf2djvu, the resulting DjVu file is much _bigger_ than the PDF. This mostly defeats the point of the DjVu format. There must be a deficiency in pdf2djvu to cause that. When the PBM file is directly fed to cjb2, the resulting djvu file is rightfully _smaller_ than the PDF was. One could use cjb2 as a workaround, but the problem is the PDF middle step is useful for OCR ops. It’s worth noting that this project may have fallen out of maintenance: https://github.com/jwilk-archive/pdf2djvu/issues/157 The man page still points to that bug tracker and the mailing list (which also became inactive in 2022). Workaround (untested and complex!): A workaround that would result in text would normally involve using ocrodjvu, but that package has been killed off. It’s unclear if “djvused --save-script” can somehow be used. Perhaps it can be run on the fat OCRd djvu file, then the output could perhaps be injected into the lean file from cjb2 using “djvused -f”. -- System Information: Debian Release: 12.6 APT prefers stable-updates APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 'stable'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages pdf2djvu depends on: ii djvulibre-bin 3.5.28-2+b1 ii libc6 2.36-9+deb12u7 ii libdjvulibre21 3.5.28-2+b1 ii libexiv2-27 0.27.6-1 ii libgcc-s1 12.2.0-14 ii libgomp1 12.2.0-14 ii libgraphicsmagick++-q16-12 1.4+really1.3.40-4 ii libgraphicsmagick-q16-3 1.4+really1.3.40-4 ii libpoppler126 22.12.0-2+b1 ii libstdc++6 12.2.0-14 ii libuuid1 2.38.1-5+deb12u1 pdf2djvu recommends no packages. Versions of packages pdf2djvu suggests: ii poppler-data 0.4.12-1 -- no debconf information