My colleagues Eric Pugh and Dan Worley covered OCR and Solr in a presentation at our recent London Lucene/Solr Meetup:
https://www.meetup.com/Apache-Lucene-Solr-London-User-Group/events/264579498/
(direct link to slides if you can't find it in the comments https://www.slideshare.net/o19s/payloads-and-ocr-with-solr)

HTH

Charlie


On 14/10/2019 11:40, Retro wrote:
Hello, thanks for answer, but let me explain the setup. We are running our
own backup solution for emails (messages from Exchange in MSG format).
Content of these messages then indexed in SOLR. But SOLR can not process
attachments within those MSG files, can not OCR them. This is what I need -
to OCR attachments and get their content indexed in SOLR.

Davis, Daniel (NIH/NLM) [C] wrote
Nuance and ABBYY provide OCR capabilities as well.
Looking at higher level solutions, both indexengines.com and Comvault can
do email remediation for legal issues.
AJ Weber wrote
There are alternative, paid, libraries to parse and extract attachments
from EML files as well
EML attachments will have a mimetype associated with their metadata.
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to