Yes, you are right. I was just trying to help, and did not have time to dig out the details. So the question is: how do you tell Solr to pass the language arg to Tika and Tesseract?
On February 11, 2017 12:54:02 AM EST, "Игорь Абрашин" <vjiaste...@gmail.com> wrote: >Hi, Rick. >I didnt mean that he need to train, because tesseract works well >separetly. >So, tika included in solr doesnt try to use russian dict to recognize >cyrillic text and result comes up utilize only eng alphabet. > >10 февр. 2017 г. 15:28 пользователь "Rick Leir" <rl...@leirtech.com> >написал: > >> My guess is that you are using using Tika and Tesseract. The latter >is >> complex, and you can start learning at >> >> https://wiki.apache.org/tika/TikaOCR <--shows you how to work with >TIFF >> >> The traineddata for Cyrillic is here: >> >> https://github.com/tesseract-ocr/tesseract/wiki/Data-Files >> >> https://github.com/tesseract-ocr/tesseract/issues/147 >> >> You likely need to enhance the images before running Tesseract. >> >> cheers -- Rick >> >> On 2017-02-10 05:03 AM, Игорь Абрашин wrote: >> >>> Hello, community! >>> Did you manage to recognize jpf,tiff or whatever with cyrillics text >>> inside? >>> Ive got only latin letter (looks like ugly translite text) in result >for >>> that moment.For image contains only lattin letters it works fine. >>> Does anyone have any suggestion, best practice or case studies refer >to >>> this situation? >>> >>> >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.