Hi there We have an application that can index the contents of PDF files, so that we can use that for a search algorithm. We use the Apache PDFBox library for extracting text from a PDF, like this (where inputStream is a ByteArrayInputStream containing the contents of the PDF file):
PDFTextStripper pdfStripper = new PDFTextStripper(); pdDoc = PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly()); String parsedText = pdfStripper.getText(pdDoc); We ran into a sample PDF file, that seems to cause a memory leak, as we get an OutOfMemoryError: Java heap space. I have attached the file to this email (not sure if that works on a mailing list?) Can someone try to extract the text in this PDF file, to confirm if there is a memory leak, and maybe bring this to the attention of the developers? Thanks a lot in advance! Best regards, Søren
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

