Subject: gscan2pdf: binarization algorithms Package: gscan2pdf Version: 0.9.29 Severity: wishlist
When scanning text documents, conversion from grayscale (multi-bit pnm) to black and white bilevel (1-bit pbm) is frequently desirable either to conserve space or to facilitate OCR. The current gscan2pdf supports a simple constant thesholding function, f(x,y) = constant (default 80%) But from practical experience with documents produced from laser printers and photocopiers, often when the printer is low on toner or the electrostatic wick needs replacement, the document will show a gradient (usually horizontal) where e.g. the one side of the image will be markedly lighter than the opposite. If we apply a simple constant theshold,sufficient to bring out the lightest text, to the entire image, then the darker text will be oversaturated and the text will appear heavier, thicker. In that case we might think that a theshold function, linear in both variables x and y, and of degree 1 (simple linear gradient) should be applied: f(x,y) = ax +0y +c [for horizontal gradient] f(x,y) = 0x +by +c [for vertical gradient] f(x,y) = ax +by +c [for both gradients] Or maybe a saturation/break-point function would be better in the case where only part of the image (say the leftmost fifth) is very light. None of these methods would be particularly suitable for automatic scanning because of the necessity for the user to select values for multiple parameters. (Currently one only has to select the theshold constant value) Also, there are other situations in which a linear gradient theshold would not be appropriate, for example a faded fax thermal printer image, where some other means of separating foreground text from background would be needed. Well, after some googling I ran across "Fred's ImageMagick Scripts" <http://www.fmwconcepts.com/imagemagick/> which lists various bash scripts, including three for thesholding: otsuthresh, sahoothresh and trianglethresh It would be nice, I think, to allow choice of thesholding method, whether constant, Otsu, etc. Rather than port these scripts from bash to perl, it would probably be easier to use the perl system() function to execute these bash scripts. Maybe some kind of plugin folder could be implemented, where the user could copy various scripts. gscan2pdf could then index the plugins folder and automatically add the found plugins to the appropriate menu entry. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org