Hi,
What PDFBox version are you using?
Do you use multithreading?
Is the disk space limited?
What load() call parameters were you using (memory parameters?)
Apparently, "load(InputStream input)".
Can you reproduce the effect by loading and closing the file in a loop?
Tilman
Am 01.12.2020 um 17:10 schrieb Stefan Wurzinger:
Hi,
I’m randomly getting Errors while loading PDF documents using PDDocument.load()
method. Unfortunately I couldn’t reliably reproduce it, I just see it happening
sometimes. Usually when retrying in the same process (and on the same machine)
it will fail again. When retrying later it usually just works.
The document files are large, about 2 to 3 GB in average.
The (virtual) machine where the process runs can consume up to 20 GB of memory.
The stacktrace and error message is always the same (but it occurs at different
places where PDDocument.load() is called) and looks like this:
java.io.IOException: Requested page with index 2 was not written before.
at
org.apache.pdfbox.io.ScratchFile.readPage(ScratchFile.java:324)
at
org.apache.pdfbox.io.ScratchFileBuffer.ensureAvailableBytesInPage(ScratchFileBuffer.java:177)
at
org.apache.pdfbox.io.ScratchFileBuffer.read(ScratchFileBuffer.java:426)
at
org.apache.pdfbox.pdfparser.COSParser.isString(COSParser.java:2478)
at
org.apache.pdfbox.pdfparser.COSParser.bfSearchForLastEOFMarker(COSParser.java:1871)
at
org.apache.pdfbox.pdfparser.COSParser.bfSearchForObjects(COSParser.java:1556)
at
org.apache.pdfbox.pdfparser.COSParser.rebuildTrailer(COSParser.java:2196)
at
org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:281)
at
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173)
at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)
(it’s always “index 2”)
Can anybody give me a hint why this error might occur? Could be be some
(hidden) out-of-memory or out-of-disk-space issue? Could it be some PDFBox bug?
Could it be some timing / caching / buffering issue? Or something else (what?)?
Thanks for any hint.
Best regards,
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]