Hi, all,
Here is some more information on before the OOM happened on the rebooted node
in a 2-node test cluster:
1. It seems the schema version has changed on the rebooted node after
reboot, i.e.
Before reboot,
Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.java
(line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.java
(line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
After rebooting node 2,
Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328)
Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
2. After reboot, both nods repeatedly send MigrationTask to each other -
we suspect it is related to the schema version (Digest) mismatch after Node 2
rebooted:
The node2 keeps submitting the migration task over 100+ times to the other
node.
INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node
/192.168.88.33 has restarted, now UP
INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414)
Updating topology for /192.168.88.33
INFO [GossipStage:1] 2016-04-19 11:18:18,263 StorageService.java (line 1544)
Node /192.168.88.33 state jump to normal
INFO [GossipStage:1] 2016-04-19 11:18:18,264 TokenMetadata.java (line 414)
Updating topology for /192.168.88.33
DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102)
Submitting migration task for /192.168.88.33
DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102)
Submitting migration task for /192.168.88.33
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62)
Can't send schema pull request: node /192.168.88.33 is down.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62)
Can't send schema pull request: node /192.168.88.33 is down.
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 978)
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java
(line 102) Submitting migration task for /192.168.88.33
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 978)
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.java
(line 102) Submitting migration task for /192.168.88.33
DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 978)
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,356 MigrationManager.java
(line 102) Submitting migration task for /192.168.88.33
.
On the otherhand, Node 1 keeps updating its gossip information, followed by
receiving and submitting migrationTask afterwards:
DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,332 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 978)
InetAddress /192.168.88.34 is now UP
DEBUG [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 978)
InetAddress /192.168.88.34 is now UP
DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 977)
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 978)
InetAddress /192.168.88.34 is now UP
..
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496
MigrationRequestVerbHandler.java (line 41) Received migration request from
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,595
MigrationRequestVerbHandler.java (line 41) Received migration request from
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,843
MigrationRequestVerbHandler.java (line 41) Received migration request from
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,878
MigrationRequestVerbHandler.java (line 41) Received migration request from
/192.168.88.34.
..
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line
127) submitting migration task for /192.168.88.34
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line
127) submitting migration task for /192.168.88.34
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line
127) submitting migration task for /192.168.88.34
.
Has anyone experienced this scenario? Thanks in ad