Hello, We had an outage on one of our Solr nodes that we are trying to figure out. Here's what came up in the Solr admin logs. 3 separate ones that I think were in this order, but maybe not.
Stopping recovery for core=[b1_shard5_replica_n16] coreNodeName=[core_node19] Error while trying to recover. core=b1_shard5_replica_n16:org.apache.solr.common.SolrException: Error while saving shard term for collection: b1 at org.apache.solr.cloud.ZkShardTerms.saveTerms(ZkShardTerms.java:307) at org.apache.solr.cloud.ZkShardTerms.forceSaveTerms(ZkShardTerms.java:281) at org.apache.solr.cloud.ZkShardTerms.startRecovering(ZkShardTerms.java:227) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1576) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1500) at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:577) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:326) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/b1/terms/shard5 at org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336) at org.apache.solr.common.cloud.SolrZkClient.lambda$setData$6(SolrZkClient.java:370) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:370) at org.apache.solr.cloud.ZkShardTerms.saveTerms(ZkShardTerms.java:297) ... 14 more Could not publish that recovery failed:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue at org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1111) at org.apache.solr.common.cloud.SolrZkClient.lambda$exists$2(SolrZkClient.java:322) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:322) at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:309) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1587) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1500) at org.apache.solr.cloud.RecoveryStrategy.recoveryFailed(RecoveryStrategy.java:190) at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:715) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:326) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Solr is 8.1.1 with Zookeeper 3.4.9 deployed on the same nodes. Solr config looks like this. -DSTOP.KEY=solrrocks -DSTOP.PORT=7983 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=18983 -Dcom.sun.management.jmxremote.rmi.port=18983 -Dcom.sun.management.jmxremote.ssl=false -Djetty.home=/cul/app/solr/solr/server -Djetty.port=8983 -Dlog4j.configurationFile=file:/cul/data/solr/log4j2.xml -Dsolr.data.home= -Dsolr.default.confdir=/cul/app/solr/solr/server/solr/configsets/_default/conf -Dsolr.install.dir=/cul/app/solr/solr -Dsolr.jetty.https.port=8983 -Dsolr.log.dir=/cul/data/solr/logs -Dsolr.log.muteconsole -Dsolr.solr.home=/cul/data/solr/data -Duser.timezone=UTC -DzkClientTimeout=15000 -DzkHost=zk-host1:2181, zk-host2:2181, zk-host3:2181 -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseLargePages -XX:GCLogFileSize=20M -XX:MaxGCPauseMillis=250 -XX:NumberOfGCLogFiles=9 -XX:OnOutOfMemoryError=/cul/app/solr/solr/bin/oom_solr.sh 8983 /cul/data/solr/logs -Xloggc:/cul/data/solr/logs/solr_gc.log -Xms8g -Xmx8g -Xss256k -verbose:gc Any ideas on what to keep an eye on that would cause this would be greatly appreciated. Thanks, Robbie -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html