Am Dienstag, 21. September 2021, 13:51:12 CEST schrieb Heinz-Jürgen Oertel: > Am Montag, 20. September 2021, 21:39:49 CEST schrieb Keith Marshall: > > On 20/09/2021 19:22, Dave Kemper wrote: > > > Hi Heinz-Jürgen, > > > > > > Thanks for debugging and submitting a fix for this problem! > > > > Except that it's not really the most appropriate solution; that was > > proposed four years ago... > > > > > In general, when proposing changes to the groff code base, it's best > > > to open a bug report ... > > > > ...and Bertrand opened a (belated) ticket: > > https://savannah.gnu.org/bugs/index.php?55107 > > > > which has shown no activity since; (so even open tickets aren't immune > > to fading into obscurity). > > > > > Regarding the specific change you've proposed, there may be some > > > resistance to using a grep option that's not part of the POSIX > > > standard for the command. I'm not sure how widely implemented -a is, > > > or what equivalent solution might be more portable. > > > > I would go even further ... groff should *not* be calling out to > > external tools, such as grep — much less pdfinfo — from within core > > code, in a manner which requires use of unsafe mode, *especially* when > > core code to achieve the required functionally has been awaiting > > integration for a number of years! > > For me too it would be a better solution to not calling external tools, even > that's part of the linux/unix philosophy. > As you found out, this small problem does exist for years already. I don't > remember exactly, but I looked already at the pdfinfo code, but was not able > to correct it. It seems to be only the pdfinfo used in OpenSuse. > > Posix grep > https://www.unix.com/man-page/posix/1P/grep/ > does not know the -a or --text option. > > Regards > Heinz
I did some more research. The result, it's not "pdfinfo" it is Imagemagick "convert". I mostly use jpg file converted to pdf by "convert". The example file "Selz.pdf" % pdfinfo Selz.pdf | hexdump -xc 0000000 6954 6c74 3a65 2020 2020 2020 2020 2020 0000000 T i t l e : 0000010 5300 6500 6c00 7a00 0000 410a 7475 6f68 0000010 \0 S \0 e \0 l \0 z \0 \0 \n A u t h o 0000020 3a72 2020 2020 2020 2020 6820 7474 7370 0000020 r : h t t p s ... as one can see, there are \0 chars already in the title. Looking at the PDF: /Title <00530065006C007A0000> /CreationDate (D:20210914095154) /ModDate (D:20210914095154) /Author (https://imagemagick.org) /Producer (https://imagemagick.org) you see, the \0 chars are already there. What can I do? Regards Heinz