Bug#670831: gscan2pdf is not resilient against non utf-8 from tesseract

Thomas Koch Sun, 06 May 2012 23:33:15 -0700

Jeffrey Ratcliffe:
> clone 670831 -1
> reassign -1 tesseract-ocr
> retitle -1 tesseract-ocr: produces non-UTF8 output despite declaring
> otherwise tags 670831 patch pending
> thanks


the utf-8 issue is also already reported upstream to tesseract:
http://code.google.com/p/tesseract-ocr/issues/detail?id=690

I tested your patch for gscan2pdf. The debug output shows several lines like

utf8 "\x80" does not map to Unicode at /.../lib/Gscan2pdf.pm line 921, <> 
chunk 1.

but the process runs through and I get the OCR output in the gscan2pdf GUI.

Thank you,

Thomas Koch, http://www.koch.ro



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#670831: gscan2pdf is not resilient against non utf-8 from tesseract

Reply via email to