[ 
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616180#comment-15616180
 ] 

Tim Allison commented on TIKA-2146:
-----------------------------------

I wonder if these errors are caused by what I found with old "protected" Excel 
files.  Even though they weren't password protected, they were still 
"protected", and the inner objects were encrypted to the point that even the 
record lengths were unreadable, leading to aioobe and other similar problems.

> Unable to extract contents from protected MS 
> word-doc-java.lang.ArrayIndexOutOfBoundsException
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2146
>                 URL: https://issues.apache.org/jira/browse/TIKA-2146
>             Project: Tika
>          Issue Type: Bug
>          Components: core, parser
>    Affects Versions: 1.11
>         Environment: Windows 7
>            Reporter: Sharath Kumar
>         Attachments: Test bug.doc, This is password protected.doc
>
>
> When I try to parse a MS word document which is protected, I am unable to 
> extract the content rather, i get the below exception
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@29402a40
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.Tika.parseToString(Tika.java:537)
>       at 
> org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:102)
>       at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:1)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:99)
>       at 
> org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:482)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:436)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:262)
>       at 
> org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
>       at 
> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529)
>       at 
> org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:215)
>       at 
> org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
>       at 
> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
>       at 
> org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
>       at 
> org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
>       at 
> org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
>       at 
> org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>       at org.apache.poi.hwpf.model.SectionTable.<init>(SectionTable.java:84)
>       at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:345)
>       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to