On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote: > But the real problem is that it is impossible to recognize : > - "fi" as "fi" too > - "ff" as "ff" too > Would it be possible to add a new parameter to pdftotext to make it > ignore ligatures but still export in UTF-8?
It's quite preferable to have the ligatures in your PDF file.
The bug to fix is that poppler should expand the ligatures to their
normalized forms when extracting the text.
That bug was first reported here:
Text extraction should expand ligatures to their normal form
https://bugs.freedesktop.org/show_bug.cgi?id=7002
-Carl
pgpIYXR2aC34q.pgp
Description: PGP signature
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
