Re: Solr Cloud Segments and Merging Issues

Shawn Heisey Thu, 13 Mar 2014 10:30:06 -0700

On 3/13/2014 1:44 AM, Varun Rajput wrote:

I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
machine with a zookeeper quorum running on 3 other machines. The index size
on each shard is about 15GB. I noticed that the number of segments in
second shard was 42 and in the remaining shards was between 25-30.


I am basically trying to get the number of segments down to a reasonable
size like 4 or 5 in order to improve the search time. We do have some
documents indexed everyday, so we don't want to do an optimize every day.

The merge factor with the TierMergePolicy is only the number of segments
per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
shard, I tried clearing the index, reducing the mergeFactor and re-indexing
the same data in the same manner, multiple times, but I don't see a pattern
of reduction in number of segments.

No mergeFactor set      =>     42 segments
mergeFactor=5      =>       22 segments
mergeFactor=2      =>       22 segments

Below is the simple configuration, as specified in the documentation, I am
using for merging:

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">

           <int name="maxMergeAtOnce">2</int>

           <int name="segmentsPerTier">2</int>

</mergePolicy>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>

What is the best way in which I can use merging to restrict the number of
segments being formed?

The config with the old policy used to be the literal name"mergeFactor". With TieredMergePolicy, there are now three settingsthat must be changed in order to actually be the same as whatmergeFactor used to do.The followingconfig snippet is the equivalentconfig to a mergeFactor of 10, so these are the default settings. Ifyou don't change all three (especially segmentsPerTier), then you arenot actually changing the "mergeFactor".


  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">10</int>
    <int name="segmentsPerTier">10</int>
    <int name="maxMergeAtOnceExplicit">30</int>
  </mergePolicy>

With newer Solr versions, there is not as much speedup to be gained fromfewer segments as before. There *is* a noticeable change, but it is nolonger the night/day difference it used to be.

Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
see a great increase in response time from about 18ms to 150ms. Is this a
known issue? Is there no way to reduce the response time? In the MBeans,
the individual cores show the /select handler attributes having search
times around 8ms. What is it that causes the overall response time to
increase so much?

Assuming that there are no system resource limitations(especially RAM),a distributed index is slower than a single index of the same totalsize. Where distributed indexes have an edge is in very large indexesor indexes with a moderately high query rate -- by applying more totalRAM and/or CPU resources to the problem. If your index already fitsentirely into the OS disk cache, or you are sending a a handful of testqueries, you won't notice any performance benefit from going distributed.

For SUPER high query rates, you need more replicas. More shards mightactually make performance go down in this situation.

You can run a single shard with SolrCloud -- there's nothing saying theindex HAS to be distributed.


Thanks,
Shawn

Re: Solr Cloud Segments and Merging Issues

Reply via email to