On Tue, Sep 07, 2010 at 11:05:05AM -0700, Leonard Rosenthol wrote: > I can tell you with 100% certainty that Acrobat/Reader do NOT use raw order - > they use "reading order". The algorithms haven't changed between 8 & 9. >
Ok > I also looked at your PDFs and in both cases, OO is writing the content > streams in exactly the same way & order - top to bottom, left to right. It > doesn't write the first column and then the second in the "real-column" > example. Open up the PDF's content stream and look. You'll see almost > identical streams. I don't know how to see the PDF's content stream. I have try with the "Save as text" option in Acrobat/Reader and it does what you say. Still that doesn't explain why Acrobat/Reader does no select the right thing in the fake-columns example. > > And Adobe Reader 9.3.4 is the current version for Linux - I just checked on > Adobe.com. I double checked and the problem was the language preference of my browser. The latest version of Acrobat/Reader in Spanish is 8.1.7. If you choose the English version then you are right and the latest one is 9.3.4. Best regards, Lorenzo > > > Leonard Rosenthol > PDF Standards Architect > Adobe Systems > > -----Original Message----- > From: Lorenzo Gil [mailto:[email protected]] > Sent: Tuesday, September 07, 2010 2:00 PM > To: Leonard Rosenthol > Cc: 'Albert Astals Cid'; [email protected] > Subject: Re: [poppler] New selection algorithm > > On Mon, Sep 06, 2010 at 01:45:55PM -0700, Leonard Rosenthol wrote: > > >I don't think raw order is acceptable. > > > > > Agreed - never use raw order since it means nothing. > > > > You should either use "reading order" (top->bottom, left->right (or RTL, > > depending)) as computed through geometric sorting - which is what the > > current code does, at least to some extent. > > > > The difference with Acrobat/Reader is that we use additional heuristics to > > offer smarter selection semantics for columnar data, vertical text, and > > other such things. > > I've created two pdf files (attached to this mail) with OpenOffice that looks > pretty much the same in terms of layout and structure. Acrobat/Reader behaves > completely different in terms of selection: in the real-columns.pdf it > selects the text by columns but in the fake-columns if selects the text by > lines. In both cases Adobe Reader selects the text in the order that > OpenOffice put it in the document stream (e.g. raw order). The > fake-columns.pdf document was created using tabs and spaces to simulate a two > columns layout instead of the columns feature of OpenOffice. > > I'm using Adobe Reader 8.1.7 for Linux. Maybe the heuristics that you mention > were added to Adobe Reader 9 but unfortunately that's not available in Linux. > > Sorry to focus on Adobe Reader when this is Poppler list but I think we > should see Adobe Reader as the reference implementation for a PDF viewer. > > Best regards, > > Lorenzo > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
