Our cassandra.ring_delay_ms is current around 300000 to get nodes to bootstrap.
On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller <carl.muel...@smartthings.com> wrote: > We're seeing nodes bootstrapping but not streaming and joining a cluster > in 2.2.13. > > I have been looking through the MigrationManager code and the > StorageService code that seems relevant based on the Bootstrap status > messages that are coming through. I'll be referencing line numbers from the > 2.2.13 source code. > > Major unit of work #1: prechecks > > We start our bootstrap process with StorageService:759 checking if we are > autobootstrap (yes), bootstrap already in progress (no), bootstrap complete > (no), and a check if seeds contains the current broadcast address (no) > > We proceed through the 771 shouldBootstrap() call to the "JOINING: waiting > for ring information" state. > > Prior to this point, MigrationManager has already Gossip checked the > current (nonmatching) schema version uuid. I believe in this state we only > have system keyspace setup based on log messages that have already occurred > like: > > Schema.java:421 - Adding org.apache.cassandra.config.CFMetaData@35645047 > [cfId=5a1ff267-ace0-3f12-8563-cfae6103c65e,ksName=system,cfName=sstable_activity > > Those only have system tables listed, nothing else. > > Major unit of work #2: Ring Information > > Here we seem to poll all the nodes getting their token responsibilities. > Each time for almost every node we also get: > > DEBUG [GossipStage:1] 2019-06-12 15:20:07,621 MigrationManager.java:96 - > Not pulling schema because versions match or shouldPullSchemaFrom returned > false > > However our schema versions DON'T match, so the shouldPullSchemaFrom must > be short-circuiting this. shouldPullSChemaFrom() is a one-liner in > MigrationManager on line 151, it seems to check if it knows the ip it is > talking to, If that version's schema matches (which it definitely should > not at this point)..... and if it is isGossipOnlyMember which I may have > seen referred to as a "fat client". Perhaps we are defaulting to this state > somehow for nodes/endpoints that are far away or not communicating? Or our > view hasn't fully initialized yet? > > Anyway, I think these are all occurring in StorageService line 779, a for > loop. We do not see any of the "got schema" messages that would break this > for loop. So we must reach the dealy point? Is the delay parameter equal > to -Dcassandra.ring_delay_ms setting? Because perhaps the successes we DID > see were due to the increase to ring delay and that allowed this for loop > to actually complete? > > Anyway, we seem to make it out of that for loop without initializing ring > information, which also seems to not synchronize the schema. > > From there we seem to sail through line 791, the > MigrationManager.isReadyForBootstrap() for some reason. This was also > mentioned in https://issues.apache.org/jira/browse/CASSANDRA-6648 as > leading to bootstrap race conditions similar to ours and was allegedly > fixed back in 1.2.X/2.0.X releases. > > Major Unit of work #3: perform bootstrap tasks > > lines 796+ of StorageService output ready to bootstrap / range calculation > messages, and confirm ring + schema info has been performed, even though > the logs are showing we didn't seem to sync schema. > > the strict consistency check is performed at line 803, which we are using > by the way > > The log outputs the getting bootstrap token status message of JOINING from > line 821 > > From here there is a ton of log errors for UnknwonColumnFamilyExceptions. > Hundreds of them > > 19-06-12 15:21:47,644 IncomingTcpConnection.java:100 - > UnknownColumnFamilyException reading from socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=dd5d7fa0-1e50-11e9-a62d-0d41c97b2404 > > At the end of these exceptions is Three log statements: > > INFO [main] 2019-06-12 15:23:25,515 StorageService.java:1142 - JOINING: > Starting to bootstrap... > > INFO [main] 2019-06-12 15:23:25,525 StreamResultFuture.java:87 - [Stream > #05af9ee0-8d26-11e9-85c1-bd5476090c54] Executing streaming plan for > Bootstrap > > INFO [main] 2019-06-12 15:23:25,526 StorageService.java:1199 - Bootstrap > completed! for the tokens [-7314981925085449175, <254 more tokens>, > 5499447097629838103] > > And that is where we think our node has joined the cluster as a data-less > UN node. Maybe like a "fat client"? Who knows. > > Bootstrap completed! message comes from a Future in StorageService:1199, > which was produced on 1192 by bootstrapper.bootstrap(). It times out in 10s > of millis or less, so it clearly isn't doing anything or hasn't been given > anything to do. > > -------------------------- > > AFTER the bootstrp completeion message, we have messages for setting up > > system_traces and system_distributed > > These come 10s of millis after the completed future, so they must be lines > 905 and 906 executing. > > THEN we have more messages for gossiping schema version (three or so) > where FINALLY the UUID of the schema matches all the other schema uuids as > shown in gossipinfo. I speculate this is occuring i the finishJoiningRing() > call on line 912. > > > --------------------------- > > So it appears the "guard" for loop is not sufficiently guarding, and > schema is not being synchronized in our case in sufficient time. Perhaps > all the other seeds or nodes are having communication issues and are acting > as fat clients, so the bootstrap process assumes this is a schemaless node. > > Is that about correct or do I not understand some aspect of what is going > on, outside of the various attempts to handle not-necessarily-up nodes, > seeds, and streaming sources which I definitely do not fully understand yet. > > >