On Sat, Feb 05, 2005 at 12:05:16PM +0100, ïtienne Bersac wrote: > gocr does not properly encode "ff" in UTF8 (of example in "affect"). > Every occurence of "ff" is replace with "\u07ec\uffff". Strange ! you > will find a pics to reproduce the bug at > http://bersace03.free.fr/pub/gocr-bug-accents.png .
I do not use UTF8 at all, but from what I can test: gocr -f UTF8 -i yourfile.png and gocr -i yourfile.png output the same, showing a hash (#) instead of double-f Is that what you see? In this case it's probably a non-UTF8 related stuff (actually the recognition engine should be decoupled from the way you render data). (I cannot see the UTF8 char you see and I currently cannot configure my system with UTF8, let me understand where the problem is and pass info to upstream). thanks, cosimo.