On 22/02/2025 05:02, David Wright wrote:
On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote:

P.S. "pdftotext -layout" in some cases is better than without
"-layout".

I think the results are roughly comparable with my scrapings,
for this document at least. Perhaps both pdftotext and xpdf
rely on poppler to do the work.

The poppler library is a fork of xpdf. It is used by evince, okular, and some other PDF viewers. Since that time xpdf upstream and poppler codebase have been diverged significantly. Upstream xpdf has got GUI based on QT. Debian has the old variant of xpdf packaged since the security team is against having 2 similar libraries in the repositories (and it is reasonable). Likely there are enough incompatibilities that make unfeasible porting current QT-based xpdf to poppler. Some distributions like ArchLinux have current upstream xpdf release.

While evince, xpdf, pdftotext use the same poppler library, selection behavior is different. For tables aligned text is certainly preferred, while in other cases wrapped text mode is better.

P.S. I have not tried elpa-pdf-tools, so I can tell nothing concerning its features related to text selection.

Reply via email to