Hi, Now I'm thinking about the possibility to add "image_list" API, which is similar to text_list API of cpp frontend, giving the list of the structures including the rectangle and the pointer to the image data stream.
The easiest idea would be the incorporation of ImageOutputDev into cpp frontend. However, there is a known issue in ImageOutputDev; the images drawn by tiling operations are not counted. https://bugs.freedesktop.org/show_bug.cgi?id=91734 I should emphasize this is not so marginal case. When I make a PDF from a HTML with many small images, via Firefox on GNU/Linux, often the resulted PDF draw the images by the titling operation, although the images never repeated X-o. I'm not sure whether the fix in above bugzilla is right or not (it seems that nobody reviews the quick fix patch), but this fix just enables to list (with original metrics), and extract the image data - the metrics in drawn result is not available. So it is not the perfect solution to discuss the "image_list" API. there would be a rationale for the original author to write such simple patch. The tiling operations are executed as: 1) create new output (e.g. splash bitmap, cairo surface, etc) to draw a single image as a pattern 2) transfer the drawn image to original output to calculate the positions & metrics in the resulted image, the chain of the temporal output should be kept. The difficulty to handle the images drawn by tiling would be: * it is not easy to count how many times the image are repeated. * to obtain the position & metrics, the chain of tiling operation should be preserved. we cannot assume the rendering of the image for the title do not invoke yet another tiling operation. Thinking about the alternative, the possibility would be parsing SVG (or XML, or CairoScript) generated by CairoOutputDev. It seems that SVG generated by Cairo has a flat structure (no grouped coordinate transform), all position & metric informations could be retrieved by the neighborhood XML elements. However, there are 3 concerns. -- a) nobody guarantees the forward compatibility about the flat structure of SVG (or CairoScript, XML surface). b) poppler has no dependency with XML parsing library, except of the case that fontconfig depending libexpat. c) tiling onto SVG or XML surface can cause some rasterization. when I convert pattern-tiling example at https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Patterns onto PDF by librsvg, it includes no raster data (pattern.pdf.xz), but if I revert it from PDF to SVG by pdftocairo (pattern.re.svgz), the result includes the raster data X-o. therefore, there is a possibility that inexisting images are counted in this method. -- So, what is the right way? if it is not the time to put "image_list" into cpp frontend, is it acceptable to add similar features to pdftimage or pdftocairo? Regards, mpsuzuki
pattern.pdf.xz
Description: Binary data
_______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
