Ok, thanks a lot for looking into this Tilman. I will try your suggestion and keep fiddling with it :)
Have a great weekend! On 10 May 2019, 08.12 +0200, Tilman Hausherr <[email protected]>, wrote: > Am 10.05.2019 um 07:22 schrieb Søren Pedersen: > > We have an application that can index the contents of PDF files, so that we > > can use that for a search algorithm. We use the Apache PDFBox library for > > extracting text from a PDF, like this (where inputStream is a > > ByteArrayInputStream containing the contents of the PDF file): > > > > PDFTextStripper pdfStripper = new PDFTextStripper(); > > pdDoc = PDDocument.load(inputStream, > > MemoryUsageSetting.setupTempFileOnly()); > > String parsedText = pdfStripper.getText(pdDoc); > > > You can pass the byte[] directly to load(). Also make sure that the > bytes are not altered in any way, e.g. through a incorrectly configured > web downloading, or an incorrectly configured resource loading > ("filtering" option must be false). > > > Also retry with 2.0.16 snapshot. > > Tilman > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >

