[
https://issues.apache.org/jira/browse/CASSANDRA-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070395#comment-18070395
]
Ariel Weisberg commented on CASSANDRA-20955:
--------------------------------------------
Stale shards cause {{IllegalStateException}} during round-trip migration,
breaking schema agreement in tests
When a keyspace goes tracked → untracked → tracked, the second transition
back to tracked throws:
{noformat}
java.lang.IllegalStateException: Existing shard found for keyspace, but prev
ksn has mutation tracking disabled
at
MutationTrackingService$KeyspaceShards$UpdateDecision.decisionForTopologyChange(MutationTrackingService.java:1096)
at
MutationTrackingService.onNewClusterMetadata(MutationTrackingService.java:849)
at
MutationTrackingService$1.notifyPostCommit(MutationTrackingService.java:206)
at LocalLog.processPendingInternal(LocalLog.java:548)
{noformat}
The problem is that shard cleanup isn't implemented yet. When a keyspace
migrates from tracked to untracked, {{MIGRATE_FROM}} falls through to {{NONE}}
(line 924-928), so the old shards stick around. The TODO on line 925 already
acknowledges this. After migration completes the shards are still there, and
when we later flip back to tracked, {{decisionForTopologyChange()}} hits the
{{!prev.useMutationTracking() &&
next.useMutationTracking()}} branch where the precondition
{{checkState(!hasExisting)}} blows up.
This also has a nasty secondary effect: the exception fires inside
{{MutationTrackingService$1.notifyPostCommit()}}, which runs as a
{{ChangeListener}} in {{LocalLog.processPendingInternal()}}. That aborts the
rest of the listener chain for that epoch on that node, so
{{SchemaListener.notifyPostCommit()}} never runs,
{{SchemaDiagnostics.versionUpdated()}} never fires, and the dtest framework's
{{SchemaChangeMonitor}} never gets the callback. Result is a 120-second hang
waiting for schema agreement.
Worked around this in CASSANDRA-21098 by returning {{REPLICA_GROUP}} instead
of throwing when {{hasExisting}} is true, in two places in
{{decisionForTopologyChange()}}:
- Line 1076-1078: {{prevKsm == null}} case (already had this pattern)
- Lines 1095-1102: untracked→tracked case (new)
Both are marked with {{TODO (CASSANDRA-20955)}}. Once shard cleanup is
properly implemented these should go back to being precondition checks.
> CEP-45: Add support for dropping tables & keyspaces
> ---------------------------------------------------
>
> Key: CASSANDRA-20955
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20955
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination
> Reporter: Blake Eggleston
> Priority: Normal
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]