You do not tell us much of how Solr is setup. I found your stackoverflow 
question too at 
http://stackoverflow.com/questions/35220443/tesseract-command-line-ocr-engine-has-stopped-working
 with a screenshot. 

That suggests that you have setup Tika with OCR for images, and emails with 
images are attempted parsed for text inside images, by tesseract.exe. See 
https://tika.apache.org/1.11/formats.html#Image_formats for details on this 
feature in Tika.

You may want to reach out to the Tika community for advise on how to proceed. 
You may also try different versions of Tesseract 
https://github.com/tesseract-ocr/tesseract/wiki/Downloads - and perhaps newer 
version of Tika.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 8. feb. 2016 kl. 16.22 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> 
> Has anyone experienced this before during indexing of EML files?
> 
> Regards,
> Edwin
> 
> On 5 February 2016 at 17:30, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> I am indexing EML files (emails) into Solr, and some of those emails has
>> attachment.
>> 
>> During the indexing, I encountered this "*Tesseract command-line OCR
>> engine has stopped working*" message that come out from the server.
>> However, I did not see any error with the indexing, and all the EML files
>> are indexed successfully.
>> 
>> Does anyone knows what could be the reason? I am using Solr 5.4.0
>> 
>> Regards,
>> Edwin
>> 

Reply via email to