A Dissabte, 24 de setembre de 2011, Carlos Garcia Campos vàreu escriure: > Excerpts from carlosgc's message of sáb sep 24 11:25:10 +0200 2011: > > New commits: > > commit f62c2f002c782d3a7887525f031d266aca6eb582 > > Author: Carlos Garcia Campos <[email protected]> > > Date: Sat Sep 24 11:20:13 2011 +0200 > > > > xpdf303: Parse ActualText in Gfx instead of output devices > > > > Remove beginMarkedContent and endMarkedcontent and add > > beginActualText and endActualText. ActualText is parsed in Gfx, > > that already handles the marked content stack, so that text > > output dev doesn't need to handle it too. The text string is > > passed to beginActualText(). This change is not an exact merge > > of xpdf code, I've tried to keep our implementation. > Albert, this commit gave me differences in pdftotext output for 2 of > my pdf files: > > - opt-content/microtype.pdf: It fixes this document, we were > extracting pdf TeX instead of pdfTeX. > > - opt-content/publikationen.Document.100193.pdf: it's difficult to > say whether it fixes or breaks this one. The output is weird in both > cases and it doesn't match acroread either. > > I'm not sure if you have those documents, maybe with another name > (should we identify the pdfs by its md5sum too?), so let me know if > you want them.
I prefer identifying them per bug number so that when it fixes something you can go and close the bug :D > > It would be great if you could run the tests with your pdfs to see > whether there are more pdfs giving different output, and if it's > unacceptable for any of them revert or try to fix the commit. Sure, running now. Albert > > Regards, > -- > Carlos Garcia Campos > PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462 _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
