On 6/12/2013 8:50 AM, adfel70 wrote:
> We have a multi-sharded and multi-replicated collection (solr 4.3).
> 
> When we perform massive indexing (adding 5 million records with 5k bulks,
> commit after each bulk), the search performance is degrades a lot (1 sec
> query can turn to 4 sec query).
> 
> Any rule of thumb regarding best configuration for this kind of a scenario?

If it's important that your documents be visible each time you add 5000
of them, then I would switch to soft commits.  If you don't need them to
be visible until the end, then I would not send explicit commits at all
until the very end.  A middle ground - only do a soft commit after N
batches.  If N=20, that's every 100k docs.

Regardless of which choice you make in the previous paragraph, doing
periodic hard commits is very important when you have the updateLog
turned on, which is required for SolrCloud.  For that reason, I would
add autoCommit into your config with openSearcher set to false.  This
will flush the data to disk, but will not open a new searcher object, so
changes from that commit will not be visible to queries.  A hard commit
with openSearcher=false happens pretty fast.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

Exactly what to use for maxDocs and maxTime will depend on your setup.
You want to pick values large enough so commits aren't happening
constantly, but small enough so that your transaction logs don't get huge.

The rest of the wiki page that I linked has general information about
Solr performance that might be useful to you.

Thanks,
Shawn

Reply via email to