On 6/12/2013 8:50 AM, adfel70 wrote: > We have a multi-sharded and multi-replicated collection (solr 4.3). > > When we perform massive indexing (adding 5 million records with 5k bulks, > commit after each bulk), the search performance is degrades a lot (1 sec > query can turn to 4 sec query). > > Any rule of thumb regarding best configuration for this kind of a scenario?
If it's important that your documents be visible each time you add 5000 of them, then I would switch to soft commits. If you don't need them to be visible until the end, then I would not send explicit commits at all until the very end. A middle ground - only do a soft commit after N batches. If N=20, that's every 100k docs. Regardless of which choice you make in the previous paragraph, doing periodic hard commits is very important when you have the updateLog turned on, which is required for SolrCloud. For that reason, I would add autoCommit into your config with openSearcher set to false. This will flush the data to disk, but will not open a new searcher object, so changes from that commit will not be visible to queries. A hard commit with openSearcher=false happens pretty fast. http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup Exactly what to use for maxDocs and maxTime will depend on your setup. You want to pick values large enough so commits aren't happening constantly, but small enough so that your transaction logs don't get huge. The rest of the wiki page that I linked has general information about Solr performance that might be useful to you. Thanks, Shawn