Re: OCR image contains cyrillic characters

Rick Leir Sat, 11 Feb 2017 06:45:00 -0800

Yes, you are right. I was just trying to help, and did not have time to dig out 
the details. So the question is: how do you tell Solr to pass the language arg 
to Tika and Tesseract?


On February 11, 2017 12:54:02 AM EST, "Игорь Абрашин" <vjiaste...@gmail.com> 
wrote:
>Hi, Rick.
>I didnt mean that he need to train, because tesseract works well
>separetly.
>So, tika included in solr doesnt try to use russian dict to recognize
>cyrillic text and result comes up utilize only eng alphabet.
>
>10 февр. 2017 г. 15:28 пользователь "Rick Leir" <rl...@leirtech.com>
>написал:
>
>> My guess is that you are using using Tika and Tesseract. The latter
>is
>> complex, and you can start learning at
>>
>> https://wiki.apache.org/tika/TikaOCR   <--shows you how to work with
>TIFF
>>
>> The traineddata for Cyrillic is here:
>>
>> https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
>>
>> https://github.com/tesseract-ocr/tesseract/issues/147
>>
>> You likely need to enhance the images before running Tesseract.
>>
>> cheers -- Rick
>>
>> On 2017-02-10 05:03 AM, Игорь Абрашин wrote:
>>
>>> Hello, community!
>>> Did you manage to recognize jpf,tiff or whatever with cyrillics text
>>> inside?
>>> Ive got only latin letter (looks like ugly translite text) in result
>for
>>> that moment.For image contains only lattin letters it works fine.
>>> Does anyone have any suggestion, best practice or case studies refer
>to
>>> this situation?
>>>
>>>
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: OCR image contains cyrillic characters

Reply via email to