[ 
https://issues.apache.org/jira/browse/TIKA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127060#comment-17127060
 ] 

Xiaohong Yang commented on TIKA-3107:
-------------------------------------

Thank you for the information.  I filed the following bug in Apache POI. 

Bug 64500 - LeftoverDataException: Initialisation of record 
0x85(BoundSheetRecord) left 28 bytes remaining still to be read 
([https://bz.apache.org/bugzilla/show_bug.cgi?id=64500]). 

We do not know what software generated the sample file. Excel can open it 
properly.

> AutoDetectParser.parse failed with error "Initialisation of record 
> 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3107
>                 URL: https://issues.apache.org/jira/browse/TIKA-3107
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.24
>            Reporter: Xiaohong Yang
>            Priority: Critical
>         Attachments: SOJ.NW.00092712.xls
>
>
> When I try to get the metadata of the sample excel file with the 
> AutoDetectParser.parse method with the following Java code, I got an error 
> "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining 
> still to be read".
>  
> InputStream input = new FileInputStream(localFilePath);
> BodyContentHandler handler = = new BodyContentHandler(-1);
> Metadata metadata = new Metadata();
> TikaConfig config = TikaConfigFactory.getTikaConfig();
> Parser autoDetectParser = new AutoDetectParser(config);
> ParseContext context = new ParseContext();
> context.set(TikaConfig.class, config);
> autoDetectParser.parse(input, handler, metadata, context);
>  
> Here is the stack trace:
>  
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@2caa5ec
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>        …
>        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>        at java.util.concurrent.FutureTask.run(FutureTask.java)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>        at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: 
> Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still 
> to be read.
>        at 
> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
>        at 
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)
>        at 
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
>        at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)
>        at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
>        at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to