[ 
https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134303#comment-17134303
 ] 

Tilman Hausherr commented on TIKA-3111:
---------------------------------------

No, I got it to work with several changes in AbstractPDF2XHTML, i.e. use the 4 
parameter call and get the unicode myself.

[~lehmi] WDYT of this? IMHO the contract of the deprecated showGlyph() has been 
broken because now, unicode is null when called.

{code}
    protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, 
Vector displacement) throws IOException
    {
        String unicode = font.toUnicode(code);
        super.showGlyph(textRenderingMatrix, font, code, displacement);
        if (unicode == null || unicode.isEmpty()) {
            unmappedUnicodeCharsPerPage++;
        }
        totalCharsPerPage++;
    }
{code}


> Upgrade to PDFBox 2.0.20
> ------------------------
>
>                 Key: TIKA-3111
>                 URL: https://issues.apache.org/jira/browse/TIKA-3111
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to