Hi,

tutorialkart is not "our" website.

The order in the resources has nothing to do with the visual position.

The "negative" might be an image mask that is in the resources despite that it is not used directly (it may be used by an image to get a transparency effect)

The "second" approach is the one used by the ExtractImages.java tool which is available in the source code download. That would be the one to use.

Tilman



Am 01.10.2020 um 07:33 schrieb Dan Fulea:
Hello,
I am using pdfbox for handling PDFs and it is doing its job quite fine most
of the time.
However, I encounter a strange behaviour when extracting images embedded in
some PDFs.
I start with the following code (I think it is taken from one of yours
tutorials):
for (PDPage page : list) {
        PDResources pdResources = page.getResources();
        for (COSName c : pdResources.getXObjectNames()) {
            PDXObject o = pdResources.getXObject(c);
            if (o instanceof PDImageXObject) {
             imageCount++;
             WRITEIMAGE(o,....);//WRITING IMAGE TO DISK GOES HERE
            }
        }
}
This is clean, have logic and seems natural, but poses a problem:
The problem with this approach is that we always obtain DOUBLED images for
each one real image in PDF. One image is a good one, the other is some kind
of "negative" of the good one. Moreover, the images order (the image index
as they appear in PDF from top to bottom) are scrambled.

The second approach involve the following tutorial:
https://www.tutorialkart.com/pdfbox/how-to-get-location-and-size-of-images-in-pdf/

The image writting routine is done inside the processOperator method, just
before the following line:
System.out.println("\nImage [" + objectName.getName() + "]");
In this approach, we get the correct images count (no duplicates) and in
correct order. This is what I want and it is very very good,

Although those approaches look somehow similar, why the first one behaves
so strangely?
Which way do you recommend to extract the images?
I am uncomfortable not fully understanding all of these issues.

Please help me understand better, thank you,
Dan Fulea



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to