I've got some PDF's to try to read. Many of them have images in them. I'd
like to be able to iterate over the images and determine their encoding
(png vs. jpeg vs. ?) and size.
I've found a sample that lets me iterate over the PDXObject entities, but
I'm missing a key piece to determine the size and format of the objects.
a) Is a PDXObject always an image, or could it be something else?
Here is the code I've got so far.
for ( PDPage aPage : pdfDocument.getPages() ) {
PDResources pdResources = aPage.getResources();
for ( COSName cosObject : pdResources.getXObjectNames() ) {
PDXObject xObj = pdResources.getXObject( cosObject);
System.out.println( "got an image maybe" );
This is where I've gotten stumped. I've looked at lots of lists of
COS-whatever things, but it has not led me to "the answer."
Thanks for any guidance you can provide.
Dave Patterson