Re: Extract vectors

Andreas Lehmkühler Wed, 04 Feb 2009 11:12:06 -0800

Jeremias Maerki schrieb:

>> But it could be an alternative to modify ExtractImages as follows:
>>
>> - use resources.getXObjects() instead of resources.getImages()
>> - iterate through the XObjects filtering with the subtype "Form"
>> - create PDXObjectForm-objects
>> - save the stream of the XObject to a file
> 
> Ok, but what would saving the stream to a file accomplish? It would not
> be a valid PDF file and you'd still have to write some sort of
> interpreter. I'm not sure if ExtractImages should be enhanced at all. If
> functionality could be added to extract Form XObjects, some people will
> want to extract them as bitmaps. Others will want vectors. But in what
> format? Some will want PDF, others EPS or SVG. I guess that will be
> subject to discussion how this should be done. Anyway, the first step as
> I see it would be extending PageDrawer to be able to draw Form XObjects,
> too. That way, people can convert those Form XObject to any output
> format they want.
First of all there was a misunderstanding on my side. I thought, that a
Form XObject supports several vector formats like svg etc. and that the
handling is similar to Image XObjects. But after your post and some
minutes reading the pdf-specs I realized it's different. Form XObject
are embedded mins-pdfs within a pdf. Finally we "simply" have to parse
the stream of the Form Xobject and that's it. As you can see in
org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of
pdfbox. So displaying such a document shouldn't be a problem. To save an
isolated Form XObject as bitmap or so, isn't possible yet, but it
couldn't be that difficult.



> But then, we still don't know if Graeme Kidd's PDF actually contains
> images in the form of Form XObjects or not.
Until now the whole discussion was theoretical, but perhaps someone
could provide us with a example....


Andreas

Re: Extract vectors

Reply via email to