Thanks Josh, I was actually researching quite heavily, and found myself on the #ghostscript channel @ freenode
They pointed me to MuPDF (one of there projects), and it seems like the "pdfdraw" example project is something to work from, either directly; or through parsing XML output from it. However, if this doesn't suit your needs, please tell me why, as I might have the same problem, and then I'll join forces! :] On Wed, Oct 12, 2011 at 3:44 AM, Josh Richardson <[email protected]> wrote: > Thanks for the pointer, Glad. > > FYI, I am also interested in being able to analyze document structure. > Our first step is to put the text back together, since in many PDFs, it is > not logically organized in the original PDF. pdf2html has a "coalesce" > function which is the starting point for us. We have made some > improvements on it which are not yet contributed back -- so let me know if > you want the source and/or if you want to join forces. > > --josh > > On 10/11/11 12:31 AM, "Glad Deschrijver" <[email protected]> > wrote: > >>On Tuesday 11 October 2011, Alec Taylor wrote: >>> Good afternoon, >>> >>> Do you have some recommends and/or sample code for comparing textual >>> and geometric layout information across pages? >>> >>> Basically I'm trying to realise patterns within documents, e.g., page >>> numbers, header and footers, title, column information &etc; using the >>> capabilities of the Poppler PDF library. >> >>Not sure that it will help you much, but you can have a look at DiffPDF >>which >>uses poppler to compare two PDF files page by page (both textually and >>visually): >>http://www.qtrac.eu/diffpdf.html >> >>Best regards, >>Glad >> >>-- >> Everything that is really great and inspiring is created by >> the individual who can labor in freedom. >> -- Albert Einstein, Out of My Later Years (1950) >> >>_______________________________________________ >>poppler mailing list >>[email protected] >>http://lists.freedesktop.org/mailman/listinfo/poppler >> > > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
