My guess is that you are using using Tika and Tesseract. The latter is complex, and you can start learning at

https://wiki.apache.org/tika/TikaOCR   <--shows you how to work with TIFF

The traineddata for Cyrillic is here:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

https://github.com/tesseract-ocr/tesseract/issues/147

You likely need to enhance the images before running Tesseract.

cheers -- Rick

On 2017-02-10 05:03 AM, Игорь Абрашин wrote:
Hello, community!
Did you manage to recognize jpf,tiff or whatever with cyrillics text inside?
Ive got only latin letter (looks like ugly translite text) in result for
that moment.For image contains only lattin letters it works fine.
Does anyone have any suggestion, best practice or case studies refer to
this situation?


Reply via email to