Re: [poppler] poppler util pdftohtml

Leonard Rosenthol Thu, 22 Sep 2011 15:13:13 -0700

I can't recall what you said about this in the past, but since I was just
dealing with it today.


What do you do about embedded fonts?

As my company (Adobe) sells/creates fonts, I want to make sure that
pdftohtml won't be violating our IP/licenses.

Thanks in advance,
Leonard

On 9/22/11 5:51 PM, "Josh Richardson" <[email protected]> wrote:

>On 9/22/11 12:20 PM, "Jonathan Kew" <[email protected]> wrote:
>>More generally, it is not possible to recreate useful XHTML (or similar)
>>documents from arbitrary PDF files with anything like 100% reliability,
>>because many PDF files do not contain adequate information to accurately
>>map the rendered glyphs back to correct Unicode text, or to reliably
>>reconstruct the proper flow of text. Constructs such as ActualText may
>>help, but are often lacking from real-world PDF documents.
>
>W.r.t. rendering glyphs, we get around the problem of missing unicode
>mappings by taking any glyph without a unicode mapping and assigning it an
>offset in the private space of Unicode.  This produces the correct visual
>result in the XHTML, but not a full semantic representation.  If someone's
>interested, they could get the semantics right too by pattern-matching the
>glyph against an appropriate Unicode font.
>
>W.r.t. the flow of text, there have been other threads on this topic, but
>pdftohtml does make some attempt, and I believe it's possible to do this
>to a high degree of accuracy, maybe >99% -- that said, noone has done it
>yet, so either it's harder than I think, or no-one has cared enough to
>really try (and I still fall into that camp.)
>
>Best, --josh
>
>_______________________________________________
>poppler mailing list
>[email protected]
>http://lists.freedesktop.org/mailman/listinfo/poppler

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] poppler util pdftohtml

Reply via email to