Jeffrey Ratcliffe: > clone 670831 -1 > reassign -1 tesseract-ocr > retitle -1 tesseract-ocr: produces non-UTF8 output despite declaring > otherwise tags 670831 patch pending > thanks
the utf-8 issue is also already reported upstream to tesseract: http://code.google.com/p/tesseract-ocr/issues/detail?id=690 I tested your patch for gscan2pdf. The debug output shows several lines like utf8 "\x80" does not map to Unicode at /.../lib/Gscan2pdf.pm line 921, <> chunk 1. but the process runs through and I get the OCR output in the gscan2pdf GUI. Thank you, Thomas Koch, http://www.koch.ro -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org