Possible memory leak when extracting text?

Søren Pedersen Thu, 09 May 2019 08:14:54 -0700

Hi there

We have an application that can index the contents of PDF files, so that we can 
use that for a search algorithm. We use the Apache PDFBox library for 
extracting text from a PDF, like this (where inputStream is a 
ByteArrayInputStream containing the contents of the PDF file):


PDFTextStripper pdfStripper = new PDFTextStripper();
pdDoc = PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly());
String parsedText = pdfStripper.getText(pdDoc);

We ran into a sample PDF file, that seems to cause a memory leak, as we get an 
OutOfMemoryError: Java heap space. I have attached the file to this email (not 
sure if that works on a mailing list?)

Can someone try to extract the text in this PDF file, to confirm if there is a 
memory leak, and maybe bring this to the attention of the developers?

Thanks a lot in advance!

Best regards,
Søren

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Possible memory leak when extracting text?

Reply via email to