Jörg-Volker Peetz <jvpe...@web.de> writes: > Hi Kanru, > > thank you for looking into this. As you may have noticed, this kind of PDF > files > are produced from a german bank for their account statements. > Their argument for not looking further into this is the normal "Windows > doesn't > have this problem". > > Kan-Ru Chen (陳侃如) wrote on 10/03/2014 19:45: >> Hi, >> >> Jörg-Volker Peetz <jvpe...@web.de> writes: >> >>> Package: mupdf >>> Version: 1.5-1+b1 >>> Severity: normal >>> >>> Dear Kan-Ru Chen, >>> >>> the problem occurs for a special pdf-file (generated by iText v 2.0.8 on >>> a windows system, I suppose). I've attached the file. Searching for the >>> word "monat" does not find all occurrences of the word, but searching >>> for "onat" does. The pdf-file is displayed correctly, only searching >>> (and extracting the text) fails. It's a strange problem which, I have >>> to admit, also occurs with the poppler derived viewers and in >>> iceweasel. The only common library used by these tools is libfreetype6. >> >> I think the PDF file contains a incorrect /ToUnicode CMap which maps 'M' >> to 'j'. You could try to search "jonat" which will match the "monat" >> glyphs. >> > > Can you tell me which part, which library is interpreting this /ToUnicode > CMap?
The PDF renderer. So MuPDF is just interpreting what was put in the pdf file. Maybe Acrobat-reader ignores the CMap somehow. PDF rewriting usually strips this kind of information, so this pdf file could be "repaired" by this command (install ghostscript first) ps2pdf iText-2.0.8-example.pdf fixed.pdf >>> Under windows the search in Acrobat-reader works. >> >> I'm not sure how Acrobat-reader do that. >> >>> Do you have any idea what may be the problem? >>> Feel free to close the bug or re-assign it to another package. >> >> Maybe the pdf-file generating process has issues. >> >> Kanru >> > Best regards, > Jörg-Volker. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org