Excerpts from carlosgc's message of sáb sep 24 11:25:10 +0200 2011: > New commits: > commit f62c2f002c782d3a7887525f031d266aca6eb582 > Author: Carlos Garcia Campos <[email protected]> > Date: Sat Sep 24 11:20:13 2011 +0200 > > xpdf303: Parse ActualText in Gfx instead of output devices > > Remove beginMarkedContent and endMarkedcontent and add beginActualText > and endActualText. ActualText is parsed in Gfx, that already handles the > marked content stack, so that text output dev doesn't need to handle it > too. The text string is passed to beginActualText(). This change is not > an exact merge of xpdf code, I've tried to keep our implementation.
Albert, this commit gave me differences in pdftotext output for 2 of my pdf files: - opt-content/microtype.pdf: It fixes this document, we were extracting pdf TeX instead of pdfTeX. - opt-content/publikationen.Document.100193.pdf: it's difficult to say whether it fixes or breaks this one. The output is weird in both cases and it doesn't match acroread either. I'm not sure if you have those documents, maybe with another name (should we identify the pdfs by its md5sum too?), so let me know if you want them. It would be great if you could run the tests with your pdfs to see whether there are more pdfs giving different output, and if it's unacceptable for any of them revert or try to fix the commit. Regards, -- Carlos Garcia Campos PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
signature.asc
Description: PGP signature
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
