Look at your connection timeouts and your ZK timeouts. This usually means your Solr instances are going into heavy GC as Yago mentions. You can turn on GC logging if it's not already then use something like GCViewer to get a handle on the GC.
You really have two options: 1> if it is GC, tune your instances to avoid that if posisble. This is "more art than science". 2> lengthen timeouts, there are a series of them for client connections, Solr<->Solr connections and ZK<->Solr connections.... Best, Erick On Fri, Dec 16, 2016 at 2:07 AM, Yago Riveiro <yago.rive...@gmail.com> wrote: > Do some gc profiling to get some information about. It's possible you have > configure a small heap and you are running in gc stop the world issues. > > Normally zookeeper erros are bounded to gc and network latency issues > > -- > > /Yago Riveiro > > On 16 Dec 2016, 09:49 +0000, Piyush Kunal <piyush.ku...@myntra.com>, wrote: >> Looks like an issue with 6.x version then. >> But this seems too basic. Not sure if community would not have caught this >> till now. >> >> On Fri, Dec 16, 2016 at 2:55 PM, Yago Riveiro <yago.rive...@gmail.com >> wrote: >> >> > I had some of this error in my logs too on 6.3.0 >> > >> > My cluster also index like 20K docs/sec I don't know why. >> > >> > -- >> > >> > /Yago Riveiro >> > >> > On 16 Dec 2016, 08:39 +0000, Piyush Kunal <piyush.ku...@myntra.com>, >> > wrote: >> > > Anyone has noticed such issue before? >> > > >> > > On Thu, Dec 15, 2016 at 4:36 PM, Piyush Kunal <piyush.ku...@myntra.com >> > > wrote: >> > > >> > > > This is happening when heavy indexing like 100/second is going on. >> > > > >> > > > On Thu, Dec 15, 2016 at 4:33 PM, Piyush Kunal <piyush.ku...@myntra.com >> > > > wrote: >> > > > >> > > > > - We have solr6.1.0 cluster running on production with 1 shard and 5 >> > > > > replicas. >> > > > > - Zookeeper quorum on 3 nodes. >> > > > > - Using a chroot in zookeeper to segregate the configs from other >> > > > > collections. >> > > > > - Using solrj5.1.0 as our client to query solr. >> > > > > >> > > > > >> > > > > >> > > > > Usually things work fine but on and off we witness this exception >> > coming >> > > > > up: >> > > > > ============================================================= >> > > > > org.apache.solr.common.SolrException: Could not load collection from >> > > > > ZK:sprod >> > > > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive >> > > > > (ZkStateReader.java:815) >> > > > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead >> > > > > er.java:477) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl >> > > > > ection(CloudSolrClient.java:1174) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit >> > > > > hRetryOnStaleState(CloudSolrClient.java:807) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl >> > > > > oudSolrClient.java:782) >> > > > > -- >> > > > > Caused by: org.apache.zookeeper.KeeperException$ >> > SessionExpiredException: >> > > > > KeeperErrorCode = Session expired for /collections/sprod/state.json >> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException. >> > > > > java:127) >> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException. >> > > > > java:51) >> > > > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) >> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl >> > > > > ient.java:311) >> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl >> > > > > ient.java:308) >> > > > > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk >> > > > > CmdExecutor.java:61) >> > > > > at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClien >> > > > > t.java:308) >> > > > > -- >> > > > > org.apache.solr.common.SolrException: Could not load collection from >> > > > > ZK:sprod >> > > > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive >> > > > > (ZkStateReader.java:815) >> > > > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead >> > > > > er.java:477) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl >> > > > > ection(CloudSolrClient.java:1174) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit >> > > > > hRetryOnStaleState(CloudSolrClient.java:807) >> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl >> > > > > oudSolrClient.java:782) >> > > > > -- >> > > > > Caused by: org.apache.zookeeper.KeeperException$ >> > SessionExpiredException: >> > > > > KeeperErrorCode = Session expired for /collections/sprod/state.json >> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException. >> > > > > java:127) >> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException. >> > > > > java:51) >> > > > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) >> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl >> > > > > ient.java:311) >> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl >> > > > > ient.java:308) >> > > > > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk >> > > > > CmdExecutor.java:61) >> > > > > at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClien >> > > > > t.java:308) >> > > > > ============================================================= >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > This is our zoo.cfg: >> > > > > ====================================== >> > > > > tickTime=2000 >> > > > > dataDir=/var/lib/zookeeper >> > > > > clientPort=2181 >> > > > > initLimit=5 >> > > > > syncLimit=2 >> > > > > server.1=192.168.70.27:2888:3888 >> > > > > server.2=192.168.70.64:2889:3889 >> > > > > server.3=192.168.70.26:2889:3889 >> > > > > maxClientCnxns=300 >> > > > > maxSessionTimeout=90000 >> > > > > ======================================= >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > This is our solr.xml on server side >> > > > > ======================================= >> > > > > >> > > > > <solr >> > > > > >> > > > > <solrcloud >> > > > > >> > > > > <str name="host">${host:}</str >> > > > > <int name="hostPort">${jetty.port:8983}</int >> > > > > <str name="hostContext">${hostContext:solr}</str >> > > > > >> > > > > <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool >> > > > > >> > > > > <int name="zkClientTimeout">${zkClientTimeout:30000}</int >> > > > > <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout: >> > 600000}</int >> > > > > <int name="distribUpdateConnTimeout">${distribUpdateConnTimeout: >> > 60000}</int >> > > > > <str name="zkCredentialsProvider">${zkCredentialsProvider:org. >> > apache.solr.common.cloud.DefaultZkCredentialsProvider}</str >> > > > > <str name="zkACLProvider">${zkACLProvider:org.apache.solr. >> > common.cloud.DefaultZkACLProvider}</str >> > > > > >> > > > > </solrcloud >> > > > > >> > > > > <shardHandlerFactory name="shardHandlerFactory" >> > > > > class="HttpShardHandlerFactory" >> > > > > <int name="socketTimeout">${socketTimeout:600000}</int >> > > > > <int name="connTimeout">${connTimeout:60000}</int >> > > > > </shardHandlerFactory >> > > > > </solr >> > > > > >> > > > > ======================================= >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Any help appreciated. >> > > > > >> > > > > Regards, >> > > > > Piyush >> > > > > >> > > > >> > > > >> >