My first question is always “what’s the bottleneck”? Unless you’re driving your CPUs and/or I/O hard on Solr, the bottleneck is in the acquisition of the docs not on the Solr side.
Also, be sure and batch in groups of at least 10x the number of shards, see: https://lucidworks.com/post/really-batch-updates-solr-2/ Although it sounds like you’ve figured this out already…. And yeah, I’ve seen Solr indexing degrade when it’s being overwhelmed, so that might be the total issue. Best, Erick > On Oct 23, 2019, at 9:49 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 10/22/2019 1:12 PM, Nicolas Paris wrote: >>> We, at Auto-Suggest, also do atomic updates daily and specifically >>> changing merge factor gave us a boost of ~4x >> Interesting. What kind of change exactly on the merge factor side ? > > The mergeFactor setting is deprecated. Instead, use maxMergeAtOnce, > segmentsPerTier, and a setting that is not mentioned in the ref guide -- > maxMergeAtOnceExplicit. > > Set the first two to the same number, and the third to a minumum of three > times what you set the other two. > > The default setting for maxMergeAtOnce and segmentsPerTier is 10, with 30 for > maxMergeAtOnceExplicit. When you're trying to increase indexing speed and > you think segment merging is interfering, you want to increase these values > to something larger. Note that increasing these values will increase the > number of files that your Solr install keeps open. > > https://lucene.apache.org/solr/guide/8_1/indexconfig-in-solrconfig.html#mergepolicyfactory > > When I built a Solr setup, I increased maxMergeAtOnce and segmentsPerTier to > 35, and maxMergeAtOnceExplicit to 105. This made merging happen a lot less > frequently. > >> Would you say atomical update is faster than regular replacement of >> documents ? (considering my first thought on this below) > > On the Solr side, atomic updates will be slightly slower than indexing the > whole document provided to Solr. When an atomic update is done, Solr will > find the existing document, then combine what's in that document with the > changes you specify using the atomic update, and then index the whole > combined document as a new document that replaces with original. > > Whether or not atomic updates are faster or slower in practice than indexing > the whole document will depend on how your source systems work, and that is > not something we can know. If Solr can access the previous document faster > than you can get the document from your source system, then atomic updates > might be faster. > > Thanks, > Shawn