Hi folks, Been doing some SolrCloud testing and I've been experiencing some problems. I'll try to be relatively brief, but feel free to ask for additional information.
I've added about 200 million documents to a SolrCloud. The cloud contains 3 collections, and all documents were added to all three collections. While indexing these documents, we noticed 486k (!!) "No registered leader was found"-errors. 482k (!!) of which referred to the same shard. The other shards are or more or less evenly distributed in the log. This indexing job has been running for about 5 days now, and is pretty much IO-bound. CPU usage is ~50%. The load average, on the other hand, has been 128 for 5 days straight. Which is high, but fine: the machine is responsive. Memory usage is fine. Most of it is going towards file system caches and the like. Each Solr instance has 8GB Xmx, and is currently using about 7GB. I haven't noticed any OutOfMemoryErrors in the log files. Monitoring shows that both Solr instances have been up throughout these procedings. Now, I'm willing to accept that these Solr instances don't have enough memory, or anything else, but I'm not seeing any of this reflected in the log files, which I'm finding troubling. What I do notice in the log file, is the very vague "SolrException: Service Unavailable". See below. Could anyone shed some light on what could be causing these errors? Thanks a bunch, - Bram SolrCloud Setup: ---------------- - Version: 5.4.0 - 3 Collections -- firstCollection : 18 shards -- secondCollection: 36 shards -- thirdCollection : 79 shards - Routing: implicit - 2 Solr Instances -- 8GB Xmx. Machine: -------- - Hexacore Xeon E5-1650 - 64GB RAM - 50TB Disk (RAID6, 10 disks) Leader Stack Trace: ------------------- Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No registered leader was found after waiting for 4000ms , collection: biweekly slice: thirdCollectionShard39 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:118) ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) ~[solr-solrj-4.7.1.jar:4.7.1 1582953 - sarowe - 2014-03-29 00:43:32] Service Unavailable Log: ------------------------ 527280878 ERROR (qtp59559151-194160) [c:collectionTwo s:collectionTwoShard12 r:core_node12 x:collectionTwo_collectionTwoShard12_replica1] o.a.s.u.SolrCmdDistributor forwarding update to http://[CENSORED]:8983/solr/collectionTwo_collectionTwoShard1_replica1/ failed - retrying ... retries: 15 add{,id=000195641101} params:update.distrib=TOLEADER&distrib.from=http://[CENSORED]:6666/solr/collectionTwo_collectionTwoShard12_replica1/ rsp:503:org.apache.solr.common.SolrException: Service Unavailable