Hi Edwin
The pdf file format can store text as an image, and then you need OCR to get 
the text. However, text is more commonly not stored as an image in the pdf, and 
then you should not use OCR to get the text.

Do you get an error message when you have a failure?
Cheers -- Rick

On March 18, 2017 12:01:17 PM EDT, Zheng Lin Edwin Yeo <edwinye...@gmail.com> 
wrote:
>Hi,
>
>I'm facing the issue of that the Tesseract OCR is not able to extract
>the
>words in a PDF file in an attachment in EMLfile and index it into Solr
>occasionally? However, most of the time it can be extracted.
>
>What could be the reason that causes the file in the email attachment
>to be
>failed to extracted using OCR?
>
>I'm using Solr 6.4.2.
>
>Regards,
>Edwin

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to