subject:"Re\: Slow Indexing scaling issue"

Re: Slow Indexing scaling issue

2019-08-19 Thread Furkan KAMACI

Hi Parmeshwor, 2 hours for 3 gb of data seems too slow. We scale up to PBs in such a way: 1) Ignore all commits from client via IgnoreCommitOptimizeUpdateProcessorFactory 2) Heavy processes are done on external Tika server instead of Solr Cell with embedded Tika feature. 3) Adjust autocommit, sof

Re: Slow Indexing scaling issue

2019-08-13 Thread Erick Erickson

Here’s some sample SolrJ code using TIka outside of Solr’s Extracting Request Handler, along with some info about why loading Solr with the job of extracting text is not optimal speed wise: https://lucidworks.com/post/indexing-with-solrj/ > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wrote: > >

Re: Slow Indexing scaling issue

2019-08-13 Thread Jan Høydahl

You May want to review https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing for some hints. Make sure to index with multiple parallel threads. Also remember that using /extract on the solr side is resource intensive and may make your clus