I am doing some research on the structure of PDF files. I wrote a utility to convert the object (i.e., dictionary) structure of PDFs into XML so that I can query the structure using XPath or similar query languages. I also care about the context, and the context can be rebuilt from the resulting XML when necessary.
Nedim On Tue, 2011-11-01 at 05:26 -0700, Leonard Rosenthol wrote: > Why would you iterate over the objects w/o any understanding of their > context? Wouldn't it make MUCH MORE sense to "walk the tree" - starting > at the Catalog/Root and then simply recursing down the object tree based > on known relationships? > > What use are the objects w/o context? > > Leonard > > On 11/1/11 7:55 AM, "Nedim Srndic" <[email protected]> wrote: > > >I'm sorry, I see now that I wasn't clear enough. I would like to > >enumerate every PDF dictionary from a given PDF file, including but not > >limited to the Catalog, Pages, Actions, Annotations, Name tree - > >everything. Currently I can successfully do that for all dictionaries > >that can be located using XRef, but it seems that indirect objects > >inside object streams cannot be found this way. I could obviously test > >if any of the objects pointed to by the XRef is an object stream and get > >all the objects from the stream, but I'm wondering if Poppler has a more > >elegant solution. > > > >Nedim > > > >On Mon, 2011-10-31 at 11:12 -0700, Josh Richardson wrote: > >> What kinds of objects are you interested in? I have a version of > >> pdftohtml which I believe is not yet merged into the master repo that > >> extracts images and fonts. > >> > >> --josh > >> > >> On 10/31/11 9:16 AM, "Nedim Srndic" <[email protected]> wrote: > >> > >> >Dear list, > >> > > >> >I am using the Poppler library (in the src/poppler folder, no bindings, > >> >version 7 from the Ubuntu 10.10 repos) and would like to retrieve all > >> >objects from a PDF file. Currently, I am running a loop on XRef and > >> >getting all the non-null objects from it, but it doesn't seem to > >> >retrieve objects from object streams. What solution would you propose > >> >for this problem? > >> > > >> >Thanks, > >> >Nedim Srndic > >> > > >> >_______________________________________________ > >> >poppler mailing list > >> >[email protected] > >> >http://lists.freedesktop.org/mailman/listinfo/poppler > >> > > >> > > > > > >_______________________________________________ > >poppler mailing list > >[email protected] > >http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
