[
https://issues.apache.org/jira/browse/CASSANDRA-21260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuqi Yan updated CASSANDRA-21260:
---------------------------------
Description:
Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]
Multiple reports on this issue (reported by [~gilg]):
{code:java}
Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear
to have bad metadata - sstablemetadata reports certain columns which are not
really there (sstabledump does not show said columns on those sstables), on a
lot of sstables, most of the table actually.Not sure exactly how it got there
(we suspect someone loaded sstables of other tables since wrong column names
match ones in other tables), but right now situation is some servers work fine
and serve clients, while other servers, who have gone through restarts, are
virtually without data for those tables, since on startup there were exceptions
during sstables loading - "Unknown column xxx during deserialization" {code}
reported by [~tolbertam] :
{code:java}
it's something I saw once recently, but I haven't been able to reproduce it. I
suspect it has something to do with the refactoring around making Schema
pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044). When I saw
Gil's post it reminded me a lot of what I saw
My suspicion is it's some kind of timing issue with schema changes being pushed
out, while also being pulled from other instances.
Like, if you have a bunch of concurrent schema changes on the same keyspaces,
all submitted to different coordinators submitted relatively at the same time,
without schema disagreement, that has traditionally caused a lot of issues with
Cassandra. {code}
reported by [~curlylrt] / [~yukei]
{code:java}
We were running 4.0.6 when the column was added to a table in another keyspace.
And now, we are running 4.1.3 and we recently noticed this issue because we are
seeing error logs when loading sstables during node restart. {code}
Known facts
* On restart, SSTables with unknown columns in SSTable header will be ignored
-> once the SSTable header get contaminated, it’s data loss on restart (until
we scrub and fix the headers)
* Because the contaminated SSTable will be ignored after restart + other node
receiving the corrupted SSTables will not be able to parse it and throws
UnknownColumnException, these contaminated SSTables should not spread to other
nodes
* When doing compaction, there is no schema check. I.e. once a SSTable has the
unexpected columns in the header, new SSTable will inherit the header by
merging them blindly
** This is confirmed in local test by injecting a SSTable with unexpected
columns and running compaction
* All the victim tables / offender tables share the same primary key (for our
case)
* Seems related to schema change
was:
Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]
Multiple reports on this issue (reported by [~gilg]):
{code:java}
Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear
to have bad metadata - sstablemetadata reports certain columns which are not
really there (sstabledump does not show said columns on those sstables), on a
lot of sstables, most of the table actually.Not sure exactly how it got there
(we suspect someone loaded sstables of other tables since wrong column names
match ones in other tables), but right now situation is some servers work fine
and serve clients, while other servers, who have gone through restarts, are
virtually without data for those tables, since on startup there were exceptions
during sstables loading - "Unknown column xxx during deserialization" {code}
reported by [~tolbertam] :
{code:java}
it's something I saw once recently, but I haven't been able to reproduce it. I
suspect it has something to do with the refactoring around making Schema
pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044). When I saw
Gil's post it reminded me a lot of what I saw
My suspicion is it's some kind of timing issue with schema changes being pushed
out, while also being pulled from other instances.
Like, if you have a bunch of concurrent schema changes on the same keyspaces,
all submitted to different coordinators submitted relatively at the same time,
without schema disagreement, that has traditionally caused a lot of issues with
Cassandra. {code}
reported by [~curlylrt] / [~yukei]
{code:java}
We were running 4.0.6 when the column was added to a table in another keyspace.
And now, we are running 4.1.3 and we recently noticed this issue because we are
seeing error logs when loading sstables during node restart. {code}
Known facts
* On restart, SSTables with unknown columns in SSTable header will be ignored
-> once the SSTable header get contaminated, it’s data loss on restart (until
we scrub and fix the headers)
* Because the contaminated SSTable will be ignored after restart + other node
receiving the corrupted SSTables will not be able to parse it and throws
UnknownColumnException, these contaminated SSTables should not spread to other
nodes
* When doing compaction, there is no schema check. I.e. once a SSTable has the
unexpected columns in the header, new SSTable will inherit the header by
merging them blindly
** This is confirmed in local test by injecting a SSTable with unexpected
columns and running compaction
* All the victim tables / offender tables share the same primary key (for our
case)
> SSTable header contains unknown columns from other tables
> ---------------------------------------------------------
>
> Key: CASSANDRA-21260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21260
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Yuqi Yan
> Priority: Normal
>
> Slack: [https://the-asf.slack.com/archives/CJZLTM05A/p1762461191355119]
> Multiple reports on this issue (reported by [~gilg]):
> {code:java}
> Hey, I have an issue with a 4.1.5 cluster.For some reason a few tables appear
> to have bad metadata - sstablemetadata reports certain columns which are not
> really there (sstabledump does not show said columns on those sstables), on a
> lot of sstables, most of the table actually.Not sure exactly how it got there
> (we suspect someone loaded sstables of other tables since wrong column names
> match ones in other tables), but right now situation is some servers work
> fine and serve clients, while other servers, who have gone through restarts,
> are virtually without data for those tables, since on startup there were
> exceptions during sstables loading - "Unknown column xxx during
> deserialization" {code}
> reported by [~tolbertam] :
> {code:java}
> it's something I saw once recently, but I haven't been able to reproduce it.
> I suspect it has something to do with the refactoring around making Schema
> pluggable (https://issues.apache.org/jira/browse/CASSANDRA-17044). When I
> saw Gil's post it reminded me a lot of what I saw
> My suspicion is it's some kind of timing issue with schema changes being
> pushed out, while also being pulled from other instances.
> Like, if you have a bunch of concurrent schema changes on the same keyspaces,
> all submitted to different coordinators submitted relatively at the same
> time, without schema disagreement, that has traditionally caused a lot of
> issues with Cassandra. {code}
> reported by [~curlylrt] / [~yukei]
> {code:java}
> We were running 4.0.6 when the column was added to a table in another
> keyspace. And now, we are running 4.1.3 and we recently noticed this issue
> because we are seeing error logs when loading sstables during node restart.
> {code}
> Known facts
> * On restart, SSTables with unknown columns in SSTable header will be
> ignored -> once the SSTable header get contaminated, it’s data loss on
> restart (until we scrub and fix the headers)
> * Because the contaminated SSTable will be ignored after restart + other
> node receiving the corrupted SSTables will not be able to parse it and throws
> UnknownColumnException, these contaminated SSTables should not spread to
> other nodes
> * When doing compaction, there is no schema check. I.e. once a SSTable has
> the unexpected columns in the header, new SSTable will inherit the header by
> merging them blindly
> ** This is confirmed in local test by injecting a SSTable with unexpected
> columns and running compaction
> * All the victim tables / offender tables share the same primary key (for
> our case)
> * Seems related to schema change
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]