[ 
https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134097#comment-17134097
 ] 

Tim Allison commented on TIKA-3111:
-----------------------------------

Not sure I follow.

Text extraction seems to be the same (on a quick look), and I recognize the 
file is broken. However, we used to get character counts for all of the pages, 
and we don’t now...oddly when I build on the command line but not in IntelliJ.

If this is expected, is there a way we can get the character counts and 
unmapped character counts?

> Upgrade to PDFBox 2.0.20
> ------------------------
>
>                 Key: TIKA-3111
>                 URL: https://issues.apache.org/jira/browse/TIKA-3111
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to