Re: Possible memory leak when extracting text?

Tilman Hausherr Thu, 09 May 2019 23:13:19 -0700

Am 10.05.2019 um 07:22 schrieb Søren Pedersen:

We have an application that can index the contents of PDF files, so that we
can use that for a search algorithm. We use the Apache PDFBox library for
extracting text from a PDF, like this (where inputStream is a
ByteArrayInputStream containing the contents of the PDF file):


PDFTextStripper pdfStripper = new PDFTextStripper();
pdDoc = PDDocument.load(inputStream,
MemoryUsageSetting.setupTempFileOnly());
String parsedText = pdfStripper.getText(pdDoc);

You can pass the byte[] directly to load(). Also make sure that thebytes are not altered in any way, e.g. through a incorrectly configuredweb downloading, or an incorrectly configured resource loading("filtering" option must be false).



Also retry with 2.0.16 snapshot.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Possible memory leak when extracting text?

Reply via email to