https://bugs.documentfoundation.org/show_bug.cgi?id=158329

--- Comment #14 from ⁨خالد حسني⁩ <[email protected]> ---
(In reply to David Huggins-Daines from comment #13)
> (In reply to ⁨خالد حسني⁩ from comment #12)
> > On top of that, ToUnicode mapping must be unique, a glyph can appear there
> > only once, but fonts might map different characters to the same glyph, and
> > in this case ToUnicode to be used for one of these mappings, and all the
> > others will need ActualText.
> 
> Thank you for the really detailed explanation!  In this particular
> regression we have a sort of ligature, so ToUnicode should work, but I
> understand why it isn't sufficient in the more general case.
> 
> I'll try to do a best-effort implementation of ActualText for
> pdfminer/pdfplumber, since as you say it gets used for the smallest span of
> text necessary, and since text extraction is best-effort by definition
> anyway.
> 
> I haven't checked to see if poppler, qpdf, pdfium, and company are working
> on ActualText support...

Poppler supports ActualText, pdfium does not (at least last I checked), I don’t
know about qpdf.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to