[
https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134097#comment-17134097
]
Tim Allison commented on TIKA-3111:
-----------------------------------
Not sure I follow.
Text extraction seems to be the same (on a quick look), and I recognize the
file is broken. However, we used to get character counts for all of the pages,
and we don’t now...oddly when I build on the command line but not in IntelliJ.
If this is expected, is there a way we can get the character counts and
unmapped character counts?
> Upgrade to PDFBox 2.0.20
> ------------------------
>
> Key: TIKA-3111
> URL: https://issues.apache.org/jira/browse/TIKA-3111
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)