Hi everybody, I'm facing the same problem on solr 7.3. Probably requesting a longer session to zk (the default 10s seems too short) will solve the problem but I'm puzzled by the fact that this error is reported by solrj as a SolrException with status code 400 (BAD_REQUEST). in ZkStateReader
public static DocCollection getCollectionLive(ZkStateReader zkStateReader, String coll) { try { return zkStateReader.fetchCollectionState(coll, null); } catch (KeeperException e) { throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load collection from ZK: " + coll, e); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load collection from ZK: " + coll, e); } } Retrying the reques could solve the problem, but a client should't retry a BAD_REQUEST. Why isn't this reported as a 503 (SERVICE_UNAVAILABLE) ? I think solrj should distinguish the cases: A: communication problem with zk -> 503 B: user asked a non existing collection ->400 Thanks Il giorno ven 25 mag 2018 alle ore 05:02 Aman Singh < amandeep.coo...@gmail.com> ha scritto: > Hi Shawn & Alessandro, > We have tried to increase the heap also but we were facing the same issue > but after removing the ZK from the solr server to their dedicated server > this problem goes away, Yes when we are facing this issue the GC activity > was high around 60-70% out of 400%. > Regards, > Aman Deep singh > > On 25/05/18, 5:08 AM, "Shawn Heisey" <apa...@elyograg.org> wrote: > > On 6/20/2017 9:46 AM, Aman Deep Singh wrote: > > Sorry Shawn, > > It didn't copy entire stacktrace I put the stacktrace at > > https://www.dropbox.com/s/zf8b87m24ei2ils/solr%20exception2?dl=0 > > > > Note: I have shaded the solr library under com.gdn.solr620 so all > solr > > class will be appear as com.gdn.solr620.org.apache.solr.* > > Wow, I really dropped the ball here. The thread is nearly a year old. > I somehow missed the reply. I am sorry about that! Thank Alessandro > for reviving the thread and making it clear that I never replied. > > This is the innermost cause: > > Caused by: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /collections/productCollection/state.json > > Either there are network issues talking to ZooKeeper, or something else > caused a timeout. Solr's default ZK client timeout when it is not > configured is 15 seconds. In recent versions, the example > configurations have an explicit setting of 30 seconds. Solr's > zkClientTimeout is used to set ZooKeeper's sessionTimeout, and that's > what is exceeded when a session expires. > > When this kind of error happens, it means something has gone VERY wrong > -- 15 seconds is a REALLY long time when programs are trying to talk to > each other. > > One common cause of problems like this is extreme GC pauses. Typically > a pause problem capable of causing a ZK timeout would be due to the > heap > being too small, but it's always possible that it could happen because > the heap is VERY large. > > Errors on the client side may not be as informative as corresponding > errors in the solr.log file on the server(s). It would be a good idea > to check solr.log for errors as well. > > Thanks, > Shawn > > > > >