Re: Possible memory leak when extracting text?

Søren Pedersen Fri, 10 May 2019 00:33:45 -0700

Ok, thanks a lot for looking into this Tilman. I will try your suggestion and 
keep fiddling with it :)


Have a great weekend!
On 10 May 2019, 08.12 +0200, Tilman Hausherr <[email protected]>, wrote:
> Am 10.05.2019 um 07:22 schrieb Søren Pedersen:
> > We have an application that can index the contents of PDF files, so that we
> > can use that for a search algorithm. We use the Apache PDFBox library for
> > extracting text from a PDF, like this (where inputStream is a
> > ByteArrayInputStream containing the contents of the PDF file):
> >
> > PDFTextStripper pdfStripper = new PDFTextStripper();
> > pdDoc = PDDocument.load(inputStream,
> > MemoryUsageSetting.setupTempFileOnly());
> > String parsedText = pdfStripper.getText(pdDoc);
>
>
> You can pass the byte[] directly to load(). Also make sure that the
> bytes are not altered in any way, e.g. through a incorrectly configured
> web downloading, or an incorrectly configured resource loading
> ("filtering" option must be false).
>
>
> Also retry with 2.0.16 snapshot.
>
> Tilman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: Possible memory leak when extracting text?

Reply via email to