[
https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085128#comment-16085128
]
Luis Filipe Nassif commented on TIKA-2428:
------------------------------------------
Seems like the issue is at POI level. Threads are stuck at:
{code}
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.skip(Native Method)
at java.io.BufferedInputStream.skip(Unknown Source)
- locked <0x0000000717f30ac0> (a java.io.BufferedInputStream)
at org.apache.tika.io.ProxyInputStream.skip(ProxyInputStream.java:117)
at org.apache.tika.io.TikaInputStream.skip(TikaInputStream.java:655)
at java.io.FilterInputStream.skip(Unknown Source)
at org.apache.poi.util.IOUtils.skipFully(IOUtils.java:364)
at
org.apache.poi.hemf.record.UnimplementedHemfRecord.init(UnimplementedHemfRecord.java:43)
at
org.apache.poi.hemf.extractor.HemfExtractor$HemfRecordIterator._next(HemfExtractor.java:101)
at
org.apache.poi.hemf.extractor.HemfExtractor$HemfRecordIterator.next(HemfExtractor.java:77)
at
org.apache.poi.hemf.extractor.HemfExtractor$HemfRecordIterator.next(HemfExtractor.java:60)
at org.apache.tika.parser.microsoft.EMFParser.parse(EMFParser.java:82)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:150)
at
dpf.sp.gpinf.indexer.io.ParsingReader$ParsingTask.run(ParsingReader.java:263)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{code}
> EMFParser loops forever with corrupted files
> --------------------------------------------
>
> Key: TIKA-2428
> URL: https://issues.apache.org/jira/browse/TIKA-2428
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.15, 1.16
> Reporter: Luis Filipe Nassif
> Attachments: Carved-1285676.emf, Carved-1296288.emf, Carved-912866.emf
>
>
> EMFParser hangs with the attached corrupted EMF files.
> Sorry [[email protected]]! Just now having time to test against our
> forensic test corpus...
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)