Hi Folks,
We are seeing the following in our logs on our Solr nodes after which Solr
nodes go into multiple full GCs and eventually runs out of heap. We saw this
ticket - https://issues.apache.org/jira/browse/SOLR-7338 - wondering that’s the
one causing it. We are currently on 4.10.0
INFO - 2015-06-17 08:06:28.163;
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@422f41e9
name:ZooKeeperConnection Watcher:got event WatchedEvent state:Expired type:None
path:null path:null type:None
INFO - 2015-06-17 08:06:28.163;
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session
was expired. Attempting to reconnect to recover relationship with ZooKeeper...
INFO - 2015-06-17 08:06:28.166;
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired -
starting a new one...
INFO - 2015-06-17 08:06:28.171;
org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect
to ZooKeeper
INFO - 2015-06-17 08:06:28.177;
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@422f41e9
name:ZooKeeperConnection Watcher: got event WatchedEvent state:SyncConnected
type:None path:null path:null type:None
INFO - 2015-06-17 08:06:28.177;
org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO - 2015-06-17 08:06:28.178;
org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper
reestablished.
INFO - 2015-06-17 08:06:28.178;
org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to ZooKeeper
INFO - 2015-06-17 08:06:28.179;
org.apache.solr.common.cloud.ConnectionManager; Connected:true
WARN - 2015-06-17 08:06:28.179; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=category coreNodeName=core_node2
WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=category_shadow coreNodeName=core_node2
WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=rules_shadow coreNodeName=core_node2
WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=rules coreNodeName=core_node2
WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=catalog_shadow coreNodeName=core_node2
WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy;
Stopping recovery for core=catalog coreNodeName=core_node2
INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; publishing
core=category state=down collection=category
INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; publishing
core=category_shadow state=down collection=category_shadow
INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; publishing
core=rules_shadow state=down collection=rules_shadow
INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; publishing
core=rules state=down collection=rules
INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; publishing
core=catalog_shadow state=down collection=catalog_shadow
INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; publishing
core=catalog state=down collection=catalog
INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; numShards
not found on descriptor - reading it from system property
INFO - 2015-06-17 08:06:28.198; org.apache.solr.cloud.ZkController; Replica
core_node2 NOT in leader-initiated recovery, need to wait for leader to see
down state.
o wait for leader to see down state.
WARN - 2015-06-17 08:07:51.188; org.apache.solr.cloud.ZkController;
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /collections/rules_shadow/leader_elect/shard1/election
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:363)
at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
at
org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
ERROR - 2015-06-17 08:07:51.190; org.apache.solr.common.SolrException; There
was a problem finding the leader in zk:java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:928)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:914)
at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1514)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:386)
at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
at
org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
INFO - 2015-06-17 08:07:51.220; org.apache.solr.cloud.ZkController; Replica
core_node2 NOT in leader-initiated recovery, need to wait for leader to see
down state.
INFO - 2015-06-17 08:07:51.240; org.apache.solr.cloud.ZkController; Replica
core_node2 NOT in leader-initiated recovery, need to wait for leader to see
down state.
INFO - 2015-06-17 08:07:51.258; org.apache.solr.cloud.ZkController; Replica
core_node2 NOT in leader-initiated recovery, need to wait for leader to see
down state.
INFO - 2015-06-17 08:07:51.274; org.apache.solr.cloud.ZkController; Replica
core_node2 NOT in leader-initiated recovery, need to wait for leader to see
down state.
INFO - 2015-06-17 08:07:51.284; org.apache.solr.cloud.ElectionContext;
canceling election
/overseer_elect/election/93424944611198761-<<<>>>>:8080_solr-n_0000000286
Any pointers here?
Thanks,
Sunil