[ 
https://issues.apache.org/jira/browse/TIKA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126324#comment-17126324
 ] 

Nick Burch commented on TIKA-3107:
----------------------------------

This is a bug in Apache POI, one of the libraries that Tika depends on. Any 
chance you could report it there? 
[https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI]

It'd also be helpful to know where the file came from (what software generated 
it), if Excel gives any warnings when it opens it, and if the problem goes away 
if you do a Save-As from Excel?

> AutoDetectParser.parse failed with error "Initialisation of record 
> 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3107
>                 URL: https://issues.apache.org/jira/browse/TIKA-3107
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.24
>            Reporter: Xiaohong Yang
>            Priority: Critical
>         Attachments: SOJ.NW.00092712.xls
>
>
> When I try to get the metadata of the sample excel file with the 
> AutoDetectParser.parse method with the following Java code, I got an error 
> "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining 
> still to be read".
>  
> InputStream input = new FileInputStream(localFilePath);
> BodyContentHandler handler = = new BodyContentHandler(-1);
> Metadata metadata = new Metadata();
> TikaConfig config = TikaConfigFactory.getTikaConfig();
> Parser autoDetectParser = new AutoDetectParser(config);
> ParseContext context = new ParseContext();
> context.set(TikaConfig.class, config);
> autoDetectParser.parse(input, handler, metadata, context);
>  
> Here is the stack trace:
>  
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@2caa5ec
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>        …
>        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>        at java.util.concurrent.FutureTask.run(FutureTask.java)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>        at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: 
> Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still 
> to be read.
>        at 
> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
>        at 
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)
>        at 
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
>        at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)
>        at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
>        at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to