Yuqi Yan created CASSANDRA-21260:
------------------------------------

             Summary: SSTable header contains unknown columns from other tables
                 Key: CASSANDRA-21260
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21260
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Yuqi Yan


Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]

Multiple reports on this issue (reported by [~gilg]):
{code:java}
Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear 
to have bad metadata - sstablemetadata reports certain columns which are not 
really there (sstabledump does not show said columns on those sstables), on a 
lot of sstables, most of the table actually.Not sure exactly how it got there 
(we suspect someone loaded sstables of other tables since wrong column names 
match ones in other tables), but right now situation is some servers work fine 
and serve clients, while other servers, who have gone through restarts, are 
virtually without data for those tables, since on startup there were exceptions 
during sstables loading - "Unknown column xxx during deserialization" {code}
reported by [~tolbertam] :
{code:java}
it's something I saw once recently, but I haven't been able to reproduce it.  I 
suspect it has something to do with the refactoring around making Schema 
pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044).   When I saw 
Gil's post it reminded me a lot of what I saw
My suspicion is it's some kind of timing issue with schema changes being pushed 
out, while also being pulled from other instances.
Like, if you have a bunch of concurrent schema changes on the same keyspaces, 
all submitted to different coordinators submitted relatively at the same time, 
without schema disagreement, that has traditionally caused a lot of issues with 
Cassandra. {code}
reported by [~curlylrt] / [~yukei] 
{code:java}
We were running 4.0.6 when the column was added to a table in another keyspace. 
And now, we are running 4.1.3 and we recently noticed this issue because we are 
seeing error logs when loading sstables during node restart. {code}
Known facts
 * On restart, SSTables with unknown columns in SSTable header will be ignored 
-> once the SSTable header get contaminated, it’s data loss on restart (until 
we scrub and fix the headers)
 * Because the contaminated SSTable will be ignored after restart + other node 
receiving the corrupted SSTables will not be able to parse it and throws 
UnknownColumnException, these contaminated SSTables should not spread to other 
nodes
 * When doing compaction, there is no schema check. I.e. once a SSTable has the 
unexpected columns in the header, new SSTable will inherit the header by 
merging them blindly
 ** This is confirmed in local test by injecting a SSTable with unexpected 
columns and running compaction
 * All the victim tables / offender tables share the same primary key (for our 
case)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to