pdfmerge.1

Carlos Garcia Campos Sat, 24 Sep 2011 02:39:33 -0700

Excerpts from carlosgc's message of sáb sep 24 11:25:10 +0200 2011:

> New commits:
> commit f62c2f002c782d3a7887525f031d266aca6eb582
> Author: Carlos Garcia Campos <[email protected]>
> Date:   Sat Sep 24 11:20:13 2011 +0200
> 
>     xpdf303: Parse ActualText in Gfx instead of output devices
>     
>     Remove beginMarkedContent and endMarkedcontent and add beginActualText
>     and endActualText. ActualText is parsed in Gfx, that already handles the
>     marked content stack, so that text output dev doesn't need to handle it
>     too. The text string is passed to beginActualText(). This change is not
>     an exact merge of xpdf code, I've tried to keep our implementation.


Albert, this commit gave me differences in pdftotext output for 2 of
my pdf files:

 - opt-content/microtype.pdf: It fixes this document, we were
 extracting pdf TeX instead of pdfTeX.
 
 - opt-content/publikationen.Document.100193.pdf: it's difficult to
 say whether it fixes or breaks this one. The output is weird in both
 cases and it doesn't match acroread either.

I'm not sure if you have those documents, maybe with another name
(should we identify the pdfs by its md5sum too?), so let me know if
you want them.

It would be great if you could run the tests with your pdfs to see
whether there are more pdfs giving different output, and if it's
unacceptable for any of them revert or try to fix the commit.

Regards, 
-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462

signature.asc
Description: PGP signature

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to