Hey Shawn,
> The config with the old policy used to be the literal name
> "mergeFactor". With TieredMergePolicy, there are now three settings
> that must be changed in order to actually be the same as what
> mergeFactor used to do.The followingconfig snippet is the equivalent
> config to a mergeFactor of 10, so these are the default settings. If
> you don't change all three (especially segmentsPerTier), then you are
> not actually changing the "mergeFactor".
>
>
> 10
> 10
> 30
>
I tried specifying all these configurations, but it still doesn't work as
expected. I even tried specifying a maxMergeSegmentMB to 20GB instead of the
default 5GB. This is the config I tried:
2
2
100
2199023220
> With newer Solr versions, there is not as much speedup to be gained from
> fewer segments as before. There *is* a noticeable change, but it is no
> longer the night/day difference it used to be.
We did a performance test on a normal and optimized index and saw a
considerable improvement (almost double) in response time. That's the reason
why we want to reduce our number of segments as we have a large index with
very small amount of updates.
> Assuming that there are no system resource limitations(especially RAM),
> a distributed index is slower than a single index of the same total
> size. Where distributed indexes have an edge is in very large indexes
> or indexes with a moderately high query rate -- by applying more total
> RAM and/or CPU resources to the problem. If your index already fits
> entirely into the OS disk cache, or you are sending a a handful of test
> queries, you won't notice any performance benefit from going distributed.
We have a large index which won't fit in memory and need high query rates.
> For SUPER high query rates, you need more replicas. More shards might
> actually make performance go down in this situation.
This is something we identified while testing. We had to optimize the number
of shards to be lesser but a reasonable number that will allow us grow the
size of data in future.
-Varun
> I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
> machine with a zookeeper quorum running on 3 other machines. The index
> size
> on each shard is about 15GB. I noticed that the number of segments in
> second shard was 42 and in the remaining shards was between 25-30.
>
> I am basically trying to get the number of segments down to a reasonable
> size like 4 or 5 in order to improve the search time. We do have some
> documents indexed everyday, so we don't want to do an optimize every day.
>
> The merge factor with the TierMergePolicy is only the number of segments
> per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
> shard, I tried clearing the index, reducing the mergeFactor and
> re-indexing
> the same data in the same manner, multiple times, but I don't see a
> pattern
> of reduction in number of segments.
>
> No mergeFactor set => 42 segments
> mergeFactor=5 => 22 segments
> mergeFactor=2 => 22 segments
>
> Below is the simple configuration, as specified in the documentation, I am
> using for merging:
>
>
>
>2
>
>2
>
>
>
>
>
> What is the best way in which I can use merging to restrict the number of
> segments being formed?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Cloud-Segments-and-Merging-Issues-tp4123316p4123489.html
Sent from the Solr - User mailing list archive at Nabble.com.