Hi Fengtan,
I would just add that when merging collections, you might want to use document 
routing 
(https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
 
<https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting>)
 - since you are keeping separate collections, I guess you have a “collection 
ID” to use as routing key. This will enable you to have one collection but 
query only shard(s) with data from one “collection”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 25 Oct 2017, at 19:25, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> <1> It's not the explicit commits are expensive, it's that they happen
> too fast. An explicit commit and an internal autocommit have exactly
> the same cost. Your "overlapping ondeck searchers"  is definitely an
> indication that your commits are happening from somwhere too quickly
> and are piling up.
> 
> <2> Likely a good thing, each collection increases overhead. And
> 1,000,000 documents is quite small in Solr's terms unless the
> individual documents are enormous. I'd do this for a number of
> reasons.
> 
> <3> Certainly an option, but I'd put that last. Fix the commit problem first 
> ;)
> 
> <4> If you do this, make the autowarm count quite small. That said,
> this will be very little use if you have frequent commits. Let's say
> you commit every second. The autowarming will warm caches, which will
> then be thrown out a second later. And will increase the time it takes
> to open a new searcher.
> 
> <5> Yeah, this would probably just be a band-aid.
> 
> If I were prioritizing these, I'd do
> <1> first. If you control the client, just don't call commit. If you
> do not control the client, then what you've outlined is fine. Tip: set
> your soft commit settings to be as long as you can stand. If you must
> have very short intervals, consider disabling your caches completely.
> Here's a long article on commits....
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 
> <2> Actually, this and <1> are pretty close in priority.
> 
> Then re-evaluate. Fixing the commit issue may buy you quite a bit of
> time. Having 1,000 collections is pushing the boundaries presently.
> Each collection will establish watchers on the bits it cares about in
> ZooKeeper, and reducing the watchers by a factor approaching 1,000 is
> A Good Thing.
> 
> Frankly, between these two things I'd pretty much expect your problems
> to disappear. wouldn't be the first time I've been totally wrong, but
> it's where I'd start ;)
> 
> Best,
> Erick
> 
> On Wed, Oct 25, 2017 at 8:54 AM, Fengtan <fengtan...@gmail.com> wrote:
>> Hi,
>> 
>> We run a SolrCloud 6.4.2 cluster with ZooKeeper 3.4.6 on 3 VM's.
>> Each VM runs RHEL 7 with 16 GB RAM and 8 CPU and OpenJDK 1.8.0_131 ; each
>> VM has one Solr and one ZK instance.
>> The cluster hosts 1,000 collections ; each collection has 1 shard and
>> between 500 and 50,000 documents.
>> Documents are indexed incrementally every day ; the Solr client mostly does
>> searching.
>> Solr runs with -Xms7g -Xmx7g.
>> 
>> Everything has been working fine for about one month but a few days ago we
>> started to see Solr timeouts: https://pastebin.com/raw/E2prSrQm
>> 
>> Also we have always seen these:
>>  PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>> 
>> 
>> We are not sure what is causing the timeouts, although we have identified a
>> few things that could be improved:
>> 
>> 1) Ignore explicit commits using IgnoreCommitOptimizeUpdateProcessorFactory
>> -- we are aware that explicit commits are expensive
>> 
>> 2) Drop the 1,000 collections and use a single one instead (all our
>> collections use the same schema/solrconfig.xml) since stability problems
>> are expected when the number of collections reaches the low hundreds
>> <https://wiki.apache.org/solr/SolrPerformanceProblems#SolrCloud>. The
>> downside is that the new collection would contain 1,000,000 documents which
>> may bring new challenges.
>> 
>> 3) Tune the GC and possibly switch from CMS to G1 as it seems to bring a
>> better performance according to this
>> <https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems>,
>> this
>> <https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector>
>> and this
>> <http://lucene.472066.n3.nabble.com/java-util-concurrent-TimeoutException-Idle-timeout-expired-50001-50000-ms-td4321209.html>.
>> The downside is that Lucene explicitely discourages the usage of G1
>> <https://wiki.apache.org/lucene-java/JavaBugs#Java_Bugs_in_various_JVMs_affecting_Lucene_.2F_Solr>
>> so we are not sure what to expect. We use the default GC settings:
>>  -XX:NewRatio=3
>>  -XX:SurvivorRatio=4
>>  -XX:TargetSurvivorRatio=90
>>  -XX:MaxTenuringThreshold=8
>>  -XX:+UseConcMarkSweepGC
>>  -XX:+UseParNewGC
>>  -XX:ConcGCThreads=4
>>  -XX:ParallelGCThreads=4
>>  -XX:+CMSScavengeBeforeRemark
>>  -XX:PretenureSizeThreshold=64m
>>  -XX:+UseCMSInitiatingOccupancyOnly
>>  -XX:CMSInitiatingOccupancyFraction=50
>>  -XX:CMSMaxAbortablePrecleanTime=6000
>>  -XX:+CMSParallelRemarkEnabled
>>  -XX:+ParallelRefProcEnabled
>> 
>> 4) Tune the caches, possibly by increasing autowarmCount on filterCache --
>> our current config is:
>>  <filterCache class="solr.FastLRUCache" size="512" initialSize="512"
>> autowarmCount="0"/>
>>  <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
>> autowarmCount="32"/>
>>  <documentCache class="solr.LRUCache" size="512" initialSize="512"
>> autowarmCount="0"/>
>> 
>> 5) Tweak the timeout settings, although this would not fix the underlying
>> issue
>> 
>> 
>> Does any of these options seem relevant ? Is there anything else that might
>> address the timeouts ?
>> 
>> Thanks

Reply via email to