Hi, I'm new here and I'm considering a patch to pdftohtml (well HtmlOutputDev). I'm coming from a perl background so I may not get it right first time but I'll do my best! It will be my first patch so any help appreciated.
Changes: 1) Include the images in xml mode unless -ignore is specified. 2) Include the top, left, width, height data in img tags, where appropriate depending on mode. Not applicable to complex mode, in html mode height and width probably useful, positioning would be great but can be expanded later if required e.g. left, right or position relative to text. In xml mode just output all available data. Use Case: I'm post processing the xml and I do need the image data to be output. It's part of a workflow to produce epub ebook format from pdf. I've had a look at the code and it seems fairly straight forward, as the images are already output in other modes. Currently only the image src attribute is passed through so I guess there needs to be a new HtmlImage class (plus HtmlImages / HtmlImageAccu to handle the iteration). It looks like I can base this on the HtmlFont & HtmlLink modules, so I'll just follow the existing patterns there. Would you be likely to accept this patch once I get it working? Any suggestions? cheers, mike _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
