Re: Problems with .PDFPIC caused by pdfinfo

Keith Marshall Tue, 21 Sep 2021 14:24:40 -0700

On 21/09/2021 13:34, Heinz-Jürgen Oertel wrote:

I did some more research. The result, it's not "pdfinfo" it is
Imagemagick "convert". I mostly use jpg file converted to pdf by
"convert".


Since your graphic originates as JPG, is there any particular reason why
you cannot convert to EPS, and use .PSPIC to import it into groff?  That
way you would be using groff's built-in .psbb request, so no potentially
unsafe call-out to pdfinfo is required, to get the bounding box.

The example file "Selz.pdf"

% pdfinfo Selz.pdf | hexdump -xc
0000000    6954    6c74    3a65    2020    2020    2020    2020    2020
0000000   T   i   t   l   e   :
0000010    5300    6500    6c00    7a00    0000    410a    7475    6f68


Looks like UTF-16 creeping into what is otherwise a UTF-8 (or ASCII)
data stream.

0000010  \0   S  \0   e  \0   l  \0   z  \0  \0  \n   A   u   t   h   o
0000020    3a72    2020    2020    2020    2020    6820    7474    7370
0000020   r   :                                      h   t   t   p   s
    ...

as one can see, there are \0 chars already in the title.
Looking at the PDF:

/Title <00530065006C007A0000>


So, here the title is encoded as an ASCII hex-digit representation of
UTF-16LE text.  IIRC, that's a valid PDF encoding, but why is pdfinfo
not decoding it in a format which is consistent with the rest of its
output?  Looks like a pdfinfo bug, to me.

--
Cheers,
Keith

Re: Problems with .PDFPIC caused by pdfinfo

Reply via email to