Hi again,
I am still having problems reading in path data from PDFBox. I have an example working with PDFTron that displays the path data no problem but after inspecting the code it seems it never enters a XObject or Form XObject. (I can give you an example of the code if you want.)

It simply seems to walks straight into a Path object using an "ElementReader" which provides a way of traversing the Element display list of a page. According to its documentation: "The display list representing graphical elements (such as text-runs, paths, images, shadings, forms, etc) is accessed using the intrinsic iterator. ElementReader automatically concatenates page contents spanning multiple streams and provides a mechanism to parse contents of sub-display lists (e.g. forms XObjects and Type3 fonts). "

Is it possible for Path Objects to not be inside a form XObject? In my brief reading of the PDF Spec it doesn't seem to explicitly say that path data will be found in form XObjects. Just that "a form XObject is an entire content stream to be treated as a single graphics object". If my understanding is correct a XObject is a an external object that can be referenced in the content stream so that content can be reused. Then if the image only appears once there will be no reason create a reference for it.

If that is the case how did Adobe know where the vector images were? Do you think they went as far as hit testing the paths to see if the paths were somehow grouped together? As currently all I have is all the vector graphics found on a page in one EPS file, rather than an EPS file for each vector graphic in a page.

Thanks,
Graeme




Reply via email to