Re: regarding Extracting text from Images

2019-10-27 Thread Jörn Franke
Maybe some additional consideration: If you need to upgrade Solr then eventually you need to reindex. If you change fields or add fields then you need to reindex. Both are much faster if you have an external program that converts rich documents (pdf, word, ocr) to Text once and you use the text

Re: regarding Extracting text from Images

2019-10-27 Thread Erick Erickson
I would do neither. I’d put it all on an external server and use _that_, then send the finished docs to Solr. The problem with putting this all on Solr is at least three-fold: 1> you’re talking heavy-duty work here to do the OCR, which takes away from the available resources for searching and in