Bug#763877: mupdf: searching in pdf-file fails because first letter of words are muddled

Jörg-Volker Peetz Sat, 04 Oct 2014 01:46:23 -0700

Hi Kanru,

thank you for looking into this. As you may have noticed, this kind of PDF files
are produced from a german bank for their account statements.
Their argument for not looking further into this is the normal "Windows doesn't
have this problem".


Kan-Ru Chen (陳侃如) wrote on 10/03/2014 19:45:
> Hi,
> 
> Jörg-Volker Peetz <jvpe...@web.de> writes:
> 
>> Package: mupdf
>> Version: 1.5-1+b1
>> Severity: normal
>>
>> Dear Kan-Ru Chen,
>>
>> the problem occurs for a special pdf-file (generated by iText v 2.0.8 on
>> a windows system, I suppose). I've attached the file. Searching for the
>> word "monat" does not find all occurrences of the word, but searching
>> for "onat" does. The pdf-file is displayed correctly, only searching
>> (and extracting the text) fails.  It's a strange problem which, I have
>> to admit, also occurs with the poppler derived viewers and in
>> iceweasel. The only common library used by these tools is libfreetype6.
> 
> I think the PDF file contains a incorrect /ToUnicode CMap which maps 'M'
> to 'j'. You could try to search "jonat" which will match the "monat"
> glyphs.
>

Can you tell me which part, which library is interpreting this /ToUnicode CMap?

>> Under windows the search in Acrobat-reader works.
> 
> I'm not sure how Acrobat-reader do that.
> 
>> Do you have any idea what may be the problem?
>> Feel free to close the bug or re-assign it to another package.
> 
> Maybe the pdf-file generating process has issues.
> 
> Kanru
> 
Best regards,
Jörg-Volker.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#763877: mupdf: searching in pdf-file fails because first letter of words are muddled

Reply via email to