PdfDebugger is working fine - so the issue must be with how I'm using the
library, or how I'm extracting the globals stream...
I checked the globals stream contents that I'm extracting and compared to
the globals in PDFDebugger, and they are identical bytes.
I also checked the image content stream, and it has identical bytes as well.
I even changed my code to be identical to yours:
JBIG2ImageReader reader = (JBIG2ImageReader)
ImageIO.getImageReadersByFormatName("JBIG2").next();
JBIG2Globals globals =
reader.processGlobals(ImageIO.createImageInputStream(new
ByteArrayInputStream(globalBytes)));
reader.setGlobals(globals);
reader.setInput(ImageIO.createImageInputStream(new
ByteArrayInputStream(imageBytes)));
return reader.read(0, reader.getDefaultReadParam());
and it still fails.
But PDFDebugger works fine.
So it would seem like the way that PDFBox invokes JBIG2ImageReader is not
the above? Could that be right??
- K
Kevin Day
*trumpet**p| *480.961.6003 x1002
*e| *[email protected]
*www.trumpetinc.com <http://trumpetinc.com/>*
LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
<http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc>
On Fri, Sep 20, 2019 at 9:28 PM Tilman Hausherr <[email protected]>
wrote:
> I wonder if the PDF can be displayed with PDFDebugger. If no => bug. If
> yes, then you should debug this to see what calls are done, and whether
> you have the same data input. Your calls seem to be OK, they look
> similar to those I did when I debugged something in the jbig2 reader
> (link is before it went to Apache, don't open issues on github):
> https://github.com/levigo/jbig2-imageio/issues/21
>
> Tilman
>
> Am 20.09.2019 um 22:23 schrieb Kevin Day:
> > I am trying to use JBIG2ImageReader to parse JBIG2 data from a PDF (the
> > image stream and globals are being provided - we are not using PdfBox to
> > parse the PDF itself). Please let me know if I should be using a
> different
> > communication avenue for JBIG2 specific questions.
> >
> >
> > Here's what I'm trying to do:
> >
> > JBIG2ImageReader jbig2Reader = new JBIG2ImageReader(new
> > JBIG2ImageReaderSpi());
> >
> > byte[] globalBytes = //raw bytes from PDF
> > DECODEPARAMS, JBIG2GLOBALS
> >
> > ImageInputStream globalsInputStream = new
> > DefaultInputStreamFactory().getInputStream(new
> > ByteArrayInputStream(globalBytes));
> >
> > JBIG2Globals globals =
> > jbig2Reader.processGlobals(globalsInputStream);
> > jbig2Reader.setGlobals(globals);
> >
> > byte[] imageBytes = // raw JBIG2 image stream bytes
> from
> > PDF
> > ImageInputStream imageInputStream = new
> > DefaultInputStreamFactory().getInputStream(new
> > ByteArrayInputStream(image.getImageAsBytes()));
> > jbig2Reader.setInput(imageInputStream);
> >
> > return jbig2Reader.read(0);
> >
> >
> > When I do this, I get a null pointer exception:
> >
> > Exception in thread "main" java.lang.RuntimeException: Can't instantiate
> > segment classException in thread "main" java.lang.RuntimeException: Can't
> > instantiate segment class at
> >
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:420)
> > at org.apache.pdfbox.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:202)
> > at org.apache.pdfbox.jbig2.JBIG2Page.createPage(JBIG2Page.java:168) at
> > org.apache.pdfbox.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:157)
> at
> > org.apache.pdfbox.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:133) at
> > org.apache.pdfbox.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:249)
> at
> > javax.imageio.ImageReader.read(ImageReader.java:939)
> >
> > ....
> >
> > Caused by: java.lang.NullPointerException at
> >
> org.apache.pdfbox.jbig2.segments.TextRegion.initSymbols(TextRegion.java:1010)
> > at
> >
> org.apache.pdfbox.jbig2.segments.TextRegion.getSymbols(TextRegion.java:273)
> > at
> >
> org.apache.pdfbox.jbig2.segments.TextRegion.parseHeader(TextRegion.java:154)
> > at org.apache.pdfbox.jbig2.segments.TextRegion.init(TextRegion.java:1128)
> > at
> >
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:413)
> > ... 19 more
> >
> >
> >
> >
> >
> >
> >
> > The SegmentHeader array in TextRegion looks like this:
> >
> > (org.apache.pdfbox.jbig2.SegmentHeader[]) [null,
> >
> > #SegmentNr: 377
> > SegmentType: 0
> > PageAssociation: 1
> > Referred-to segments: none
> > ]
> >
> >
> >
> > Note that the first element is null. I'm not sure why this is (maybe
> it's
> > not a valid JBIG2 data stream??). This file opens and displays fine in
> PDF
> > viewers, so I'm assuming it must be something that I'm doing wrong.
> >
> >
> > Any pointers?
> >
> > - K
> >
> > Kevin Day
> >
> > *trumpet**p| *480.961.6003 x1002
> > *e| *[email protected]
> > *www.trumpetinc.com <http://trumpetinc.com/>*
> >
> > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
> > <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>