Re: Using Tesseract OCR to extract PDF files in EML file attachment

Rick Leir Mon, 03 Apr 2017 23:26:01 -0700

Tesseract prolly knows nothing of the EML format. Your scripts could pull EML's 
apart.


On April 4, 2017 2:00:19 AM EDT, Zheng Lin Edwin Yeo <edwinye...@gmail.com> 
wrote:
>Hi,
>
>Currently, I am able to extract scanned PDF images and index them to
>Solr
>using Tesseract OCR, although the speed is very slow.
>
>However, for EML files with PDF attachments that consist of scanned
>images,
>the Tesseract OCR is not able to extract the text from those PDF
>attachments.
>
>Can we use the same method for EML files? Or what are the suggestions
>that
>we can do to extract those attachments?
>
>I'm using Solr 6.5.0
>
>Regards,
>Edwin

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Using Tesseract OCR to extract PDF files in EML file attachment

Reply via email to