We are in conversations with AWS, hopefully an IPV6 expert, to examine what happened.
On Thu, Jun 13, 2019 at 11:19 AM Carl Mueller <carl.muel...@smartthings.com> wrote: > Our cassandra.ring_delay_ms is current around 300000 to get nodes to > bootstrap. > > On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller <carl.muel...@smartthings.com> > wrote: > >> We're seeing nodes bootstrapping but not streaming and joining a cluster >> in 2.2.13. >> >> I have been looking through the MigrationManager code and the >> StorageService code that seems relevant based on the Bootstrap status >> messages that are coming through. I'll be referencing line numbers from the >> 2.2.13 source code. >> >> Major unit of work #1: prechecks >> >> We start our bootstrap process with StorageService:759 checking if we are >> autobootstrap (yes), bootstrap already in progress (no), bootstrap complete >> (no), and a check if seeds contains the current broadcast address (no) >> >> We proceed through the 771 shouldBootstrap() call to the "JOINING: >> waiting for ring information" state. >> >> Prior to this point, MigrationManager has already Gossip checked the >> current (nonmatching) schema version uuid. I believe in this state we only >> have system keyspace setup based on log messages that have already occurred >> like: >> >> Schema.java:421 - Adding org.apache.cassandra.config.CFMetaData@35645047 >> [cfId=5a1ff267-ace0-3f12-8563-cfae6103c65e,ksName=system,cfName=sstable_activity >> >> Those only have system tables listed, nothing else. >> >> Major unit of work #2: Ring Information >> >> Here we seem to poll all the nodes getting their token responsibilities. >> Each time for almost every node we also get: >> >> DEBUG [GossipStage:1] 2019-06-12 15:20:07,621 MigrationManager.java:96 - >> Not pulling schema because versions match or shouldPullSchemaFrom returned >> false >> >> However our schema versions DON'T match, so the shouldPullSchemaFrom must >> be short-circuiting this. shouldPullSChemaFrom() is a one-liner in >> MigrationManager on line 151, it seems to check if it knows the ip it is >> talking to, If that version's schema matches (which it definitely should >> not at this point)..... and if it is isGossipOnlyMember which I may have >> seen referred to as a "fat client". Perhaps we are defaulting to this state >> somehow for nodes/endpoints that are far away or not communicating? Or our >> view hasn't fully initialized yet? >> >> Anyway, I think these are all occurring in StorageService line 779, a for >> loop. We do not see any of the "got schema" messages that would break this >> for loop. So we must reach the dealy point? Is the delay parameter equal >> to -Dcassandra.ring_delay_ms setting? Because perhaps the successes we DID >> see were due to the increase to ring delay and that allowed this for loop >> to actually complete? >> >> Anyway, we seem to make it out of that for loop without initializing ring >> information, which also seems to not synchronize the schema. >> >> From there we seem to sail through line 791, the >> MigrationManager.isReadyForBootstrap() for some reason. This was also >> mentioned in https://issues.apache.org/jira/browse/CASSANDRA-6648 as >> leading to bootstrap race conditions similar to ours and was allegedly >> fixed back in 1.2.X/2.0.X releases. >> >> Major Unit of work #3: perform bootstrap tasks >> >> lines 796+ of StorageService output ready to bootstrap / range >> calculation messages, and confirm ring + schema info has been performed, >> even though the logs are showing we didn't seem to sync schema. >> >> the strict consistency check is performed at line 803, which we are using >> by the way >> >> The log outputs the getting bootstrap token status message of JOINING >> from line 821 >> >> From here there is a ton of log errors for UnknwonColumnFamilyExceptions. >> Hundreds of them >> >> 19-06-12 15:21:47,644 IncomingTcpConnection.java:100 - >> UnknownColumnFamilyException reading from socket; closing >> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find >> cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404 >> >> At the end of these exceptions is Three log statements: >> >> INFO [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING: >> Starting to bootstrap... >> >> INFO [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream >> #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for >> Bootstrap >> >> INFO [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap >> completed! for the tokens [-7314981925085449175, <254 more tokens>, >> 5499447097629838103] >> >> And that is where we think our node has joined the cluster as a data-less >> UN node. Maybe like a "fat client"? Who knows. >> >> Bootstrap completed! message comes from a Future in StorageService:1199, >> which was produced on 1192 by bootstrapper.bootstrap(). It times out in 10s >> of millis or less, so it clearly isn't doing anything or hasn't been given >> anything to do. >> >> -------------------------- >> >> AFTER the bootstrp completeion message, we have messages for setting up >> >> system_traces and system_distributed >> >> These come 10s of millis after the completed future, so they must be >> lines 905 and 906 executing. >> >> THEN we have more messages for gossiping schema version (three or so) >> where FINALLY the UUID of the schema matches all the other schema uuids as >> shown in gossipinfo. I speculate this is occuring i the finishJoiningRing() >> call on line 912. >> >> >> --------------------------- >> >> So it appears the "guard" for loop is not sufficiently guarding, and >> schema is not being synchronized in our case in sufficient time. Perhaps >> all the other seeds or nodes are having communication issues and are acting >> as fat clients, so the bootstrap process assumes this is a schemaless node. >> >> Is that about correct or do I not understand some aspect of what is going >> on, outside of the various attempts to handle not-necessarily-up nodes, >> seeds, and streaming sources which I definitely do not fully understand yet. >> >> >>