[
https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134303#comment-17134303
]
Tilman Hausherr commented on TIKA-3111:
---------------------------------------
No, I got it to work with several changes in AbstractPDF2XHTML, i.e. use the 4
parameter call and get the unicode myself.
[~lehmi] WDYT of this? IMHO the contract of the deprecated showGlyph() has been
broken because now, unicode is null when called.
{code}
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code,
Vector displacement) throws IOException
{
String unicode = font.toUnicode(code);
super.showGlyph(textRenderingMatrix, font, code, displacement);
if (unicode == null || unicode.isEmpty()) {
unmappedUnicodeCharsPerPage++;
}
totalCharsPerPage++;
}
{code}
> Upgrade to PDFBox 2.0.20
> ------------------------
>
> Key: TIKA-3111
> URL: https://issues.apache.org/jira/browse/TIKA-3111
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)