Ok, that is very interesting. Thanks a lot for looking into this!

I am a bit baffled as to why we experience the memory leak then, but I guess I 
will have to dig more into it.

Best regards,
Søren
On 10 May 2019, 18.30 +0200, Andreas Lehmkuehler <[email protected]>, wrote:
> Am 10.05.19 um 15:52 schrieb Søren Pedersen:
> > I have done some more testing, and I found that when I run on Windows there 
> > are no problems, but when I run on Linux I get the memory leak. Tilman, 
> > would you be able to run the same test on a Linux box? - or maybe using a 
> > Linux Docker container, like I showed originally?
>
> I've extracted the text on linux (fedora 30, openjdk 1.8.0_212) without any
> problems using
>
> java -Xmx9m -jar pdfbox-app-2.0.15.jar ExtractText
>
> where -Xmx9m is the smallest working value
>
> Andreas
>
> >
> > We would prefer to run our app on Linux, but this looks like a blocker for 
> > that unfortunately :(
> >
> > Best regards,
> > Søren Pedersen
> > On 10 May 2019, 09.32 +0200, Søren Pedersen <[email protected]>, wrote:
> > > Ok, thanks a lot for looking into this Tilman. I will try your suggestion 
> > > and keep fiddling with it :)
> > >
> > > Have a great weekend!
> > > On 10 May 2019, 08.12 +0200, Tilman Hausherr <[email protected]>, 
> > > wrote:
> > > > Am 10.05.2019 um 07:22 schrieb Søren Pedersen:
> > > > > We have an application that can index the contents of PDF files, so 
> > > > > that we
> > > > > can use that for a search algorithm. We use the Apache PDFBox library 
> > > > > for
> > > > > extracting text from a PDF, like this (where inputStream is a
> > > > > ByteArrayInputStream containing the contents of the PDF file):
> > > > >
> > > > > PDFTextStripper pdfStripper = new PDFTextStripper();
> > > > > pdDoc = PDDocument.load(inputStream,
> > > > > MemoryUsageSetting.setupTempFileOnly());
> > > > > String parsedText = pdfStripper.getText(pdDoc);
> > > >
> > > >
> > > > You can pass the byte[] directly to load(). Also make sure that the
> > > > bytes are not altered in any way, e.g. through a incorrectly configured
> > > > web downloading, or an incorrectly configured resource loading
> > > > ("filtering" option must be false).
> > > >
> > > >
> > > > Also retry with 2.0.16 snapshot.
> > > >
> > > > Tilman
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Reply via email to