Le mardi 21 août 2007 à 22:30 +0200, Albert Astals Cid a écrit :
> A Dilluns 20 Agost 2007, Carl Worth va escriure:
> > On Sun, 19 Aug 2007 22:46:16 +0200, Laurent Aguerreche wrote:
> > > But the real problem is that it is impossible to recognize :
> > > - "fi" as "fi" too
> > > - "ff" as "ff" too
> > > Would it be possible to add a new parameter to pdftotext to make it
> > > ignore ligatures but still export in UTF-8?
> >
> > It's quite preferable to have the ligatures in your PDF file.
> >
> > The bug to fix is that poppler should expand the ligatures to their
> > normalized forms when extracting the text.
> 
> Actually i disagree, if you have æ do you want to get it expanded to ae too? 
> If not why you want it with the ff ligature?

I think there are two cases here :
- "ff" is composed of two characters but relied (= ligature) when
displayed only. When wrote by hands, it is "ff";
- "æ" is always wrote "a" with "e".

(Indeed I do not know what language you are talking about as example but
I know the case of word "cœur" (= heart) in french: write it "coeur" is
always wrong).


Laurent.

> Albert
> 
> >
> > That bug was first reported here:
> >
> >     Text extraction should expand ligatures to their normal form
> >     https://bugs.freedesktop.org/show_bug.cgi?id=7002
> >
> > -Carl
> 
> 
> _______________________________________________
> poppler mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/poppler

Attachment: signature.asc
Description: Ceci est une partie de message numériquement signée

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to