Hi again,
I am still having problems reading in path data from PDFBox. I have an
example working with PDFTron that displays the path data no problem but
after inspecting the code it seems it never enters a XObject or Form
XObject. (I can give you an example of the code if you want.)
It simply seems to walks straight into a Path object using an
"ElementReader" which provides a way of traversing the Element display list
of a page. According to its documentation:
"The display list representing graphical elements (such as text-runs, paths,
images, shadings, forms, etc) is accessed using the intrinsic iterator.
ElementReader automatically concatenates page contents spanning multiple
streams and provides a mechanism to parse contents of sub-display lists
(e.g. forms XObjects and Type3 fonts). "
Is it possible for Path Objects to not be inside a form XObject? In my brief
reading of the PDF Spec it doesn't seem to explicitly say that path data
will be found in form XObjects. Just that "a form XObject is an entire
content stream to be treated as a single graphics object".
If my understanding is correct a XObject is a an external object that can be
referenced in the content stream so that content can be reused. Then if the
image only appears once there will be no reason create a reference for it.
If that is the case how did Adobe know where the vector images were? Do you
think they went as far as hit testing the paths to see if the paths were
somehow grouped together? As currently all I have is all the vector graphics
found on a page in one EPS file, rather than an EPS file for each vector
graphic in a page.
Thanks,
Graeme