[
https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864420#comment-17864420
]
ASF GitHub Bot commented on TIKA-3347:
--------------------------------------
kbachuHighSpot commented on PR #1473:
URL: https://github.com/apache/tika/pull/1473#issuecomment-2218995629
Thank you. That worked but I bumped into a new issue now after working
through few other huccups.
I am trying to parse a ppt file.
```
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.sax.OfflineContentHandler;
import org.apache.tika.parser.ocr.TesseractOCRConfig;
TesseractOCRConfig config = new TesseractOCRConfig();
config.setSkipOcr(true);
ParseContext context = new ParseContext();
context.set(TesseractOCRConfig.class, config);
Parser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
OfflineContentHandler handler = new OfflineContentHandler(new
BodyContentHandler(writer));
// Note: here we have to use TikaInputStream.get, otherwise certain
content type (e.g. 2007
// pptx) might not be correctly detected by the parser
try (InputStream original = TikaInputStream.get(input, metadata)) {
parser.parse(original, handler, metadata, context);
==> Above call is crashing with
Execution error (NoSuchMethodError) at
org.apache.poi.util.IOUtils/toByteArray (IOUtils.java:241).
'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder
org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.builder()'
}
```
> Upgrade to PDFBox 3.x when available
> ------------------------------------
>
> Key: TIKA-3347
> URL: https://issues.apache.org/jira/browse/TIKA-3347
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> 3.0.0-RC1 was recently released. We should integrate it on a dev branch asap
> so that we can help with regression testing...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)