On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> But the real problem is that it is impossible to recognize :
> - "fi" as "fi" too
> - "ff" as "ff" too
> Would it be possible to add a new parameter to pdftotext to make it
> ignore ligatures but still export in UTF-8?

It's quite preferable to have the ligatures in your PDF file.

The bug to fix is that poppler should expand the ligatures to their
normalized forms when extracting the text.

That bug was first reported here:

        Text extraction should expand ligatures to their normal form
        https://bugs.freedesktop.org/show_bug.cgi?id=7002

-Carl

Attachment: pgpIYXR2aC34q.pgp
Description: PGP signature

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to