Hey Shawn, > The config with the old policy used to be the literal name > "mergeFactor". With TieredMergePolicy, there are now three settings > that must be changed in order to actually be the same as what > mergeFactor used to do.The followingconfig snippet is the equivalent > config to a mergeFactor of 10, so these are the default settings. If > you don't change all three (especially segmentsPerTier), then you are > not actually changing the "mergeFactor". > > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> > <int name="maxMergeAtOnce">10</int> > <int name="segmentsPerTier">10</int> > <int name="maxMergeAtOnceExplicit">30</int> > </mergePolicy>
I tried specifying all these configurations, but it still doesn't work as expected. I even tried specifying a maxMergeSegmentMB to 20GB instead of the default 5GB. This is the config I tried: <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">2</int> <int name="segmentsPerTier">2</int> <int name="maxMergeAtOnceExplicit">100</int> <long name="maxMergedSegmentMB">21990232555520</long> </mergePolicy> > With newer Solr versions, there is not as much speedup to be gained from > fewer segments as before. There *is* a noticeable change, but it is no > longer the night/day difference it used to be. We did a performance test on a normal and optimized index and saw a considerable improvement (almost double) in response time. That's the reason why we want to reduce our number of segments as we have a large index with very small amount of updates. > Assuming that there are no system resource limitations(especially RAM), > a distributed index is slower than a single index of the same total > size. Where distributed indexes have an edge is in very large indexes > or indexes with a moderately high query rate -- by applying more total > RAM and/or CPU resources to the problem. If your index already fits > entirely into the OS disk cache, or you are sending a a handful of test > queries, you won't notice any performance benefit from going distributed. We have a large index which won't fit in memory and need high query rates. > For SUPER high query rates, you need more replicas. More shards might > actually make performance go down in this situation. This is something we identified while testing. We had to optimize the number of shards to be lesser but a reasonable number that will allow us grow the size of data in future. -Varun > I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each > machine with a zookeeper quorum running on 3 other machines. The index > size > on each shard is about 15GB. I noticed that the number of segments in > second shard was 42 and in the remaining shards was between 25-30. > > I am basically trying to get the number of segments down to a reasonable > size like 4 or 5 in order to improve the search time. We do have some > documents indexed everyday, so we don't want to do an optimize every day. > > The merge factor with the TierMergePolicy is only the number of segments > per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second > shard, I tried clearing the index, reducing the mergeFactor and > re-indexing > the same data in the same manner, multiple times, but I don't see a > pattern > of reduction in number of segments. > > No mergeFactor set => 42 segments > mergeFactor=5 => 22 segments > mergeFactor=2 => 22 segments > > Below is the simple configuration, as specified in the documentation, I am > using for merging: > > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> > > <int name="maxMergeAtOnce">2</int> > > <int name="segmentsPerTier">2</int> > > </mergePolicy> > > <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> > > What is the best way in which I can use merging to restrict the number of > segments being formed? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Segments-and-Merging-Issues-tp4123316p4123489.html Sent from the Solr - User mailing list archive at Nabble.com.