Jeremias Maerki schrieb: >> But it could be an alternative to modify ExtractImages as follows: >> >> - use resources.getXObjects() instead of resources.getImages() >> - iterate through the XObjects filtering with the subtype "Form" >> - create PDXObjectForm-objects >> - save the stream of the XObject to a file > > Ok, but what would saving the stream to a file accomplish? It would not > be a valid PDF file and you'd still have to write some sort of > interpreter. I'm not sure if ExtractImages should be enhanced at all. If > functionality could be added to extract Form XObjects, some people will > want to extract them as bitmaps. Others will want vectors. But in what > format? Some will want PDF, others EPS or SVG. I guess that will be > subject to discussion how this should be done. Anyway, the first step as > I see it would be extending PageDrawer to be able to draw Form XObjects, > too. That way, people can convert those Form XObject to any output > format they want. First of all there was a misunderstanding on my side. I thought, that a Form XObject supports several vector formats like svg etc. and that the handling is similar to Image XObjects. But after your post and some minutes reading the pdf-specs I realized it's different. Form XObject are embedded mins-pdfs within a pdf. Finally we "simply" have to parse the stream of the Form Xobject and that's it. As you can see in org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of pdfbox. So displaying such a document shouldn't be a problem. To save an isolated Form XObject as bitmap or so, isn't possible yet, but it couldn't be that difficult.
> But then, we still don't know if Graeme Kidd's PDF actually contains > images in the form of Form XObjects or not. Until now the whole discussion was theoretical, but perhaps someone could provide us with a example.... Andreas
