On Sat 09 Sep 2017 09:51:27 Peter Schaffter wrote: > On Sat, Sep 09, 2017, Ralph Corderoy wrote: > > Hi Peter, > > > > > > > > > The grep in pdfmom is returning a binary file hit when it encounters > > > the diacritic in > > > > > > .ds pdf:look(pdf:bm1) L'étranger > > > > > > > > What does locale(1) output for you where you run this pdfmom command? > > LANG=en_CA.UTF-8 > LANGUAGE=en_CA:en > LC_CTYPE="en_CA.UTF-8" > LC_NUMERIC="en_CA.UTF-8" > LC_TIME="en_CA.UTF-8" > LC_COLLATE="en_CA.UTF-8" > LC_MONETARY="en_CA.UTF-8" > LC_MESSAGES="en_CA.UTF-8" > LC_PAPER="en_CA.UTF-8" > LC_NAME="en_CA.UTF-8" > LC_ADDRESS="en_CA.UTF-8" > LC_TELEPHONE="en_CA.UTF-8" > LC_MEASUREMENT="en_CA.UTF-8" > LC_IDENTIFICATION="en_CA.UTF-8" > LC_ALL=en_CA.UTF-8 > > > > > The solution is to pass the -a flag to grep. > > > > > > > > How about > > > > > > groff ... 2>&1 | LC_ALL=C grep '^\.ds' | groff ... > > Yes, that's the solution I thought of before suggesting the tidier > but, as Steffen pointed out, not universal -a flag. > > > > BTW, pdfmom has a bug shown by that strace command I suggested. > > > > > > system("groff ... 2>&1 | grep '^\.ds' | groff ..."); > > > > > > That's a double-quoted Perl string so `\.' is escaping the dot and grep > > sees a plain dot for `any character'. The backslash needs doubling. > > Missed that. Argh. Why don't they make special glasses that let > you see code as if for the first time whenever you put them on? > > -- > Peter Schaffter
I can't actually recreate the problem, i.e. grep does not spit out the "binary" error. I've tried with a en_GB.UTF-8 and a en_GB environment, neither show the message. The version of grep I'm using is:- grep (GNU grep) 2.20 The double escaping of the "." in the grep pattern used to be there:- grep \"^\\.ds\" but got changed. Cheers Deri