On 22/02/2025 05:02, David Wright wrote:
On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote:
P.S. "pdftotext -layout" in some cases is better than without
"-layout".
I think the results are roughly comparable with my scrapings,
for this document at least. Perhaps both pdftotext and xpdf
rely on poppler to do the work.
The poppler library is a fork of xpdf. It is used by evince, okular, and
some other PDF viewers. Since that time xpdf upstream and poppler
codebase have been diverged significantly. Upstream xpdf has got GUI
based on QT. Debian has the old variant of xpdf packaged since the
security team is against having 2 similar libraries in the repositories
(and it is reasonable). Likely there are enough incompatibilities that
make unfeasible porting current QT-based xpdf to poppler. Some
distributions like ArchLinux have current upstream xpdf release.
While evince, xpdf, pdftotext use the same poppler library, selection
behavior is different. For tables aligned text is certainly preferred,
while in other cases wrapped text mode is better.
P.S. I have not tried elpa-pdf-tools, so I can tell nothing concerning
its features related to text selection.