Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Josh Richardson Tue, 15 Nov 2011 12:32:43 -0800

*Whistfully*:  If only there were a PDF library to make such things
simple....


On 11/15/11 12:28 PM, "Leonard Rosenthol" <[email protected]> wrote:

>For proper structure, you are going to need to find a way to match the
>structure information with the elements in the content stream and then
>somehow modify the stream accordingly (and add the relevant dictionaries,
>etc.)
>
>On 11/15/11 12:23 PM, "Josh Richardson" <[email protected]> wrote:
>
>>Someone on the list may have a better idea, but I would almost certainly
>>start with the PDFDoc created by reading the original document, and
>>inject
>>back in the meta-data that you have collected -- I believe this was
>>Leonard's recommendation as well.
>>
>>--josh
>>
>>On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote:
>>
>>>Good afternoon,
>>>
>>>How would I go about reverse-engineering an XML file generated by
>>>pdftohtml -xml bak into the [same] PDF?
>>>
>>>I have been spending a long time extending the XML output to include
>>>proper page numbers and header/footer detection.
>>>
>>>It would be extremely useful if I could push the additional logical
>>>structure information and page numbers back into the PDF the XML was
>>>generated from.
>>>
>>>How would I go about doing this?
>>>
>>>Thanks for all suggestions,
>>>
>>>Alec Taylor
>>>
>>>PS: T-9 days (or less!) until PATCH :)
>>>_______________________________________________
>>>poppler mailing list
>>>[email protected]
>>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>
>>_______________________________________________
>>poppler mailing list
>>[email protected]
>>http://lists.freedesktop.org/mailman/listinfo/poppler
>
>

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Reply via email to