Ouch... Might be the workaround implemented for this issue:
https://issues.apache.org/jira/browse/PDFBOX-4601
Tilman
Am 26.09.2019 um 21:39 schrieb Esteban R:
Hello. I'm getting a timeout in one of my tests after upgrading to v
2.0.17: PDImageXObject.getImage() takes more than 1:10 minutes instead
of less than 2 seconds with previous release 2.0.16.
I cannot provide the sample PDF because it contains sensitive
information. I have tried to simplify it but the issue (almost)
dissappears even if I save the file without changing anything.
Some more facts:
* The issue happens if the load is done with
MemoryUsageSetting.setupTempFileOnly (without that flag the issue
doesn't happen)
* The performance is OK for v 2.0.16
* Please find attached the stack trace of the timeout of my original
test
* PDF structure is quite simple and contains a single image (find
attached the relevant data from pdfdebugger)
* I have created a sample program to demonstrate the issue: it
simply loads the PDF file with the setupTempFileOnly flag and does
getImage for all the images (only one in the PDF). It then does
the same thing without the flag and after that it does the same
thing with another PDF: it is simply the same file saved by PDFBOX
with another name.
Output with pdfbox 2.0.16:
java -cp "pdfbox-2.0.16.jar;commons-logging-1.2.jar;."
TestExtractImage sample.pdf
With temp file
Before getImage: Thu Sep 26 16:32:46 ART 2019
After getImage: Thu Sep 26 16:32:46 ART 2019
Without temp file
Before getImage: Thu Sep 26 16:32:46 ART 2019
After getImage: Thu Sep 26 16:32:46 ART 2019
With temp file after saving with another name
Before getImage: Thu Sep 26 16:32:46 ART 2019
After getImage: Thu Sep 26 16:32:47 ART 2019
(i.e.: less than 2 seconds in all cases)
Output with pdfbox 2.0.17:
java -cp "pdfbox-2.0.17.jar;commons-logging-1.2.jar;."
TestExtractImage sample.pdf
With temp file
Before getImage: Thu Sep 26 16:31:09 ART 2019
After getImage: Thu Sep 26 16:32:30 ART 2019 => more than 1'20"
Without temp file
Before getImage: Thu Sep 26 16:32:30 ART 2019
After getImage: Thu Sep 26 16:32:30 ART 2019
With temp file after saving with another name
Before getImage: Thu Sep 26 16:32:30 ART 2019
After getImage: Thu Sep 26 16:32:34 ART 2019 => more than 2"
And here the source code for TestExtractImage.java :
import java.io.File;
import java.io.IOException;
import java.util.Date;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.io.MemoryUsageSetting;
public class TestExtractImage {
public static void main(String[] args) throws IOException {
System.out.println("With temp file");
PDDocument d = PDDocument.load(new File(args[0]),
MemoryUsageSetting.setupTempFileOnly() );
getImage(d);
d.close();
System.out.println("Without temp file");
d = PDDocument.load(new File(args[0]));
getImage(d);
d.save("other.pdf");
d.close();
System.out.println("With temp file after saving with
another name");
d = PDDocument.load(new File("other.pdf"),
MemoryUsageSetting.setupTempFileOnly() );
getImage(d);
}
static void getImage(PDDocument d) throws IOException {
PDResources res = d.getPage(0).getResources();
for (COSName n: res.getXObjectNames()) {
PDXObject o = res.getXObject(n);
if (o instanceof PDImageXObject){
PDImageXObject i = (PDImageXObject) o;
if ("png".equals(i.getSuffix())){
System.out.println(" Before getImage: "+new
Date());
i.getImage();
System.out.println(" After getImage: "+new
Date());
}
}
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]