Tesseract prolly knows nothing of the EML format. Your scripts could pull EML's apart.
On April 4, 2017 2:00:19 AM EDT, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: >Hi, > >Currently, I am able to extract scanned PDF images and index them to >Solr >using Tesseract OCR, although the speed is very slow. > >However, for EML files with PDF attachments that consist of scanned >images, >the Tesseract OCR is not able to extract the text from those PDF >attachments. > >Can we use the same method for EML files? Or what are the suggestions >that >we can do to extract those attachments? > >I'm using Solr 6.5.0 > >Regards, >Edwin -- Sent from my Android device with K-9 Mail. Please excuse my brevity.