On 3/13/2014 1:44 AM, Varun Rajput wrote:
I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
machine with a zookeeper quorum running on 3 other machines. The index size
on each shard is about 15GB. I noticed that the number of segments in
second shard was 42 and in the remaining shards was between 25-30.

I am basically trying to get the number of segments down to a reasonable
size like 4 or 5 in order to improve the search time. We do have some
documents indexed everyday, so we don't want to do an optimize every day.

The merge factor with the TierMergePolicy is only the number of segments
per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
shard, I tried clearing the index, reducing the mergeFactor and re-indexing
the same data in the same manner, multiple times, but I don't see a pattern
of reduction in number of segments.

No mergeFactor set      =>     42 segments
mergeFactor=5      =>       22 segments
mergeFactor=2      =>       22 segments

Below is the simple configuration, as specified in the documentation, I am
using for merging:

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">

           <int name="maxMergeAtOnce">2</int>

           <int name="segmentsPerTier">2</int>

</mergePolicy>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>

What is the best way in which I can use merging to restrict the number of
segments being formed?

The config with the old policy used to be the literal name "mergeFactor". With TieredMergePolicy, there are now three settings that must be changed in order to actually be the same as what mergeFactor used to do.The followingconfig snippet is the equivalent config to a mergeFactor of 10, so these are the default settings. If you don't change all three (especially segmentsPerTier), then you are not actually changing the "mergeFactor".

  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">10</int>
    <int name="segmentsPerTier">10</int>
    <int name="maxMergeAtOnceExplicit">30</int>
  </mergePolicy>

With newer Solr versions, there is not as much speedup to be gained from fewer segments as before. There *is* a noticeable change, but it is no longer the night/day difference it used to be.

Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
see a great increase in response time from about 18ms to 150ms. Is this a
known issue? Is there no way to reduce the response time? In the MBeans,
the individual cores show the /select handler attributes having search
times around 8ms. What is it that causes the overall response time to
increase so much?

Assuming that there are no system resource limitations(especially RAM), a distributed index is slower than a single index of the same total size. Where distributed indexes have an edge is in very large indexes or indexes with a moderately high query rate -- by applying more total RAM and/or CPU resources to the problem. If your index already fits entirely into the OS disk cache, or you are sending a a handful of test queries, you won't notice any performance benefit from going distributed.

For SUPER high query rates, you need more replicas. More shards might actually make performance go down in this situation.

You can run a single shard with SolrCloud -- there's nothing saying the index HAS to be distributed.

Thanks,
Shawn

Reply via email to