Package: poppler-utils
Version: 0.26.5-2
Severity: normal

pdftotext has improved to generate ASCII instead of ligatures (like ff)
but it still generates ligatures in some cases. For instance, if I run
pdflatex on the following file:

------------------------------------------------------------
\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

\begin{document}
\thispagestyle{empty}

chiffre

ê

\end{document}
------------------------------------------------------------

I get some ff-lig1.pdf PDF file (attached), and running pdftotext on
it gives:

chiffre
ê

This is OK. But if I run ps2pdf on this PDF file, I get some other
ff-lig1-gs.pdf PDF file (attached), and running pdftotext on it gives:

chiffre
ê

i.e. with a ligature, which makes searching text such as "chiffre"
unpredictable (the output is also less readable in a terminal with
a monospace font).

This problem doesn't occur if I replace "ê" by "é" in the LaTeX file!

Note that xpdf finds "chiffre" in both cases, so that the bug seems
to be in pdftotext. Moreover pdftotext from poppler-utils 0.18.4-6
(wheezy) gives the ligature on the 4 attached PDF files, so that it
seems that pdftotext has been improved except in the particular case
mentioned above.

-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages poppler-utils depends on:
ii  libc6         2.19-13
ii  libcairo2     1.14.0-2.1
ii  libfreetype6  2.5.2-2
ii  liblcms2-2    2.6-3+b3
ii  libpoppler46  0.26.5-2
ii  libstdc++6    4.9.2-5
ii  zlib1g        1:1.2.8.dfsg-2+b1

poppler-utils recommends no packages.

poppler-utils suggests no packages.

-- no debconf information

Attachment: ff-lig1.pdf
Description: Adobe PDF document

Attachment: ff-lig1-gs.pdf
Description: Adobe PDF document

Attachment: ff-lig2.pdf
Description: Adobe PDF document

Attachment: ff-lig2-gs.pdf
Description: Adobe PDF document

Reply via email to