*Whistfully*: If only there were a PDF library to make such things simple....
On 11/15/11 12:28 PM, "Leonard Rosenthol" <[email protected]> wrote: >For proper structure, you are going to need to find a way to match the >structure information with the elements in the content stream and then >somehow modify the stream accordingly (and add the relevant dictionaries, >etc.) > >On 11/15/11 12:23 PM, "Josh Richardson" <[email protected]> wrote: > >>Someone on the list may have a better idea, but I would almost certainly >>start with the PDFDoc created by reading the original document, and >>inject >>back in the meta-data that you have collected -- I believe this was >>Leonard's recommendation as well. >> >>--josh >> >>On 11/14/11 10:42 PM, "Alec Taylor" <[email protected]> wrote: >> >>>Good afternoon, >>> >>>How would I go about reverse-engineering an XML file generated by >>>pdftohtml -xml bak into the [same] PDF? >>> >>>I have been spending a long time extending the XML output to include >>>proper page numbers and header/footer detection. >>> >>>It would be extremely useful if I could push the additional logical >>>structure information and page numbers back into the PDF the XML was >>>generated from. >>> >>>How would I go about doing this? >>> >>>Thanks for all suggestions, >>> >>>Alec Taylor >>> >>>PS: T-9 days (or less!) until PATCH :) >>>_______________________________________________ >>>poppler mailing list >>>[email protected] >>>http://lists.freedesktop.org/mailman/listinfo/poppler >>> >> >>_______________________________________________ >>poppler mailing list >>[email protected] >>http://lists.freedesktop.org/mailman/listinfo/poppler > > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
