El Dissabte, 31 de març de 2012, a les 01:23:17, Ihar `Philips` Filipau va escriure: > Hi! > > First, an admission. When introducing the original pdftohtml's "mask > extraction as PNG" patch, it seems I got the colors wrong. Yeah, masks > have only two of them - and I got them wrong. Both. :) > > But just to be sure let me ask: 0.0 is black? and 1.0 is white? > > As per recommendation of Leonard Rosenthol, who kindly quoted some > documentation for me and hinted where to look further, I have tried to > come up with a method to detect the mask inversion. > > Note that it is a mask inversion different from the decode array > inversion. It is not even a real inversion. Simpler example (as I have > it in my documents) is that mask itself looks like negative, and the > background/foreground colors in the document are swapped. Negative > mask is used to paint white on black background, while the rest of the > document has white background. > > Since in case of pdftohtml, it is impossible to know the background, I > use simple heuristic: if getFillGray() is greater than 0.5, I assume > the mask is painted with light color over dark and thus mask's > inversion flag (as indicated by the decode array) should be inverted. > (Relies on the presumption that background of most documents is > light.) > > Attached are two different (and conflicting) patches. > > - proper-mask-color-001.diff > One-liner to clear my conscience. Use proper non-inverted colors > for PNG. My original error stemmed from me reading libpng > documentation. Indeed, libpng requires bit flip for the special case > of monochrome images. But. PNGWriter doesn't use monochrome PNGs - it > uses the grayscale instead, which doesn't require bit/byte flip.
Commited. > > - invert-mask-001.diff > Implement inversion of the mask, if that is required by the decode > array or background/foreground colors appear to be swapped. The > heuristic is just 4 lines, probably unreliable but "works for me" - > and thus I will not object for the 4 lines to be removed. Don't think it makes sense to do this, a mask is a mask, not an image, and like a mask shall be extracted imho. Or just don't extract it, but try to guess stuff will result in problems. Albert > > Thanks and have a nice weekend! _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
