Re: OCR image contains cyrillic characters

Rick Leir Fri, 10 Feb 2017 02:29:13 -0800

My guess is that you are using using Tika and Tesseract. The latter iscomplex, and you can start learning at


https://wiki.apache.org/tika/TikaOCR   <--shows you how to work with TIFF


The traineddata for Cyrillic is here:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

https://github.com/tesseract-ocr/tesseract/issues/147

You likely need to enhance the images before running Tesseract.

cheers -- Rick

On 2017-02-10 05:03 AM, Игорь Абрашин wrote:

Hello, community!
Did you manage to recognize jpf,tiff or whatever with cyrillics text inside?
Ive got only latin letter (looks like ugly translite text) in result for
that moment.For image contains only lattin letters it works fine.
Does anyone have any suggestion, best practice or case studies refer to
this situation?

Re: OCR image contains cyrillic characters

Reply via email to