The workflow is
-/ OCR new documents
-/ check quality and tune until you get good output text 
-/ keep the output text in the file system

-/ index and re-index to Solr as necessary from the file system 

Note that the OCRing is a separate task from Solr indexing, and is best done on 
separate machines. I used all the old 'surplus' servers for OCR.
Cheers -- Rick
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to