[ 
https://issues.apache.org/jira/browse/CASSANDRA-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065403#comment-18065403
 ] 

Matt Byrd commented on CASSANDRA-21173:
---------------------------------------

I suppose we have upgraded already to 5.0 at this point, we and others are 
unlikely to be able to doing something using the 2.0 version of the software, 
which probably most haven't deployed for 5+ years probably.
E.g the cluster was 2.0 say 10 years ago or so, a bunch of schema was created 
(without tableId in the folder name).
The clusters have been upgraded through 2.1 -> 3.0 -> 4.0 -> 5.0 effectively 
and now are running on 5.0, but any snapshot created on these pre-existing 
tables cannot be cleared, this includes snapshots you create today on 5.0. 
In trunk this just applies to a snapshot taken prior to a jvm restart, so is a 
slightly better situation, but still creates snapshots which cannot be cleared.
I agree that modifying the manifest is maybe too invasive, for what seems like 
limited current benefit.
I think it should be fine with the current code if tableId in TableSnapshot is 
null for these tables specifically, as it's only really used as mentioned in 
buildSnapshotId to disambiguate between different tables (of which there can 
only be one pre-2.1 of a given name on disk)

I've updated both branches with the simpler approach.

I think the only remaining question I have: is whether to just go simpler still 
and not bother attempting to retrieve the tableId and just set it to null for 
pre-2.1 tables.

It seems like for this to work in trunk we would need to move the calling 
SnapshotManager::start in CassandraDaemon further along to when we have TCM 
machinery available, maybe just after 
Startup.initialize(DatabaseDescriptor.getSeeds());.

but then this seems to be in contention with the comment around the positioning:
        // clearing of snapshots above here will in fact clear all ephemeral 
snapshots
        // which were cleared as part of startup checks before CASSANDRA-18111

Maybe that is more just maintaining consistent behavior rather than an 
underlying motivation
some commentary here:
https://issues.apache.org/jira/browse/CASSANDRA-18111?focusedCommentId=17856425&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17856425

I don't have particularly strong opinions and defer to you both.
Happy to either :
1.  Leave patches as they are (modulo review feedback)
2. Just simplify further and always set tableId to null when we're dealing with 
pre-2.1 created tables.
3. Attempt a re-order of some of the startup machinery so we can get a tableId 
where the table is still present in meta-data.

> Snapshots from tables without table-id embedded in their folder name are not 
> loaded by SnapshotLoader
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21173
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Snapshots, Local/Startup and Shutdown
>            Reporter: Matt Byrd
>            Assignee: Matt Byrd
>            Priority: Normal
>             Fix For: 5.0.x, 5.1, 6.x
>
>         Attachments: ci_summary_trunk_mbyrd_CASSANDRA-21173.html
>
>
> Tables created prior to 2.1 do not have a table-id embedded in their table 
> folder name.
> This is handled correctly in Directories.java (see constructor) unfortunately 
> in SnapshotLoader, we use a regex which attempts to extract the table-id and 
> hence skips over any tables created prior to 2.1.
> The end result is that these tables are not visible in list snapshot and more 
> importantly cannot be cleared via nodetool clearsnapshot. This was noticed 
> upon major upgrade to 5.0.
> I've observed this on 5.0, from reading the code it appears likely improved 
> in 5.1, in that it now requires a restart in addition to trigger.
> Some related tickets:
> Introduction of table-id and backwards compatible handling of old folders 
> originally here:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
> Machinery to list snapshots which doesn’t handle old format was added here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843
> https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
> There was some discussion at the time of not handling pre 2.1 tables here:
> https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
> Then nodetool clearsnapshot stopped working here with:
> https://issues.apache.org/jira/browse/CASSANDRA-17757
> Things improve a bit in 5.1 with 
> https://issues.apache.org/jira/browse/CASSANDRA-18111
> Now we no longer try and load the snapshots via SnapshotLoader in entirety 
> before deciding if we can clear them, but instead make use of 
> SnapshotManager. Whilst snapshots taken while the jvm is running are now 
> visible and clearable, from reading upon restart we lose that information and 
> cannot view/clear snapshots created before the restart.
> One solution to handle these pre 2.1 tables, is to include the table-id in 
> the manifest.json, then we'll be able to read this information if not 
> available from folder name upon restart.
> Another possibility which doesn't fix as many problems, is just to expose via 
> jmx/nodetool
> something to allow operators to bypass the snapshot loading mechanism and 
> directly clear the old pre-2.1 snapshots.
> A more involved and risky change would be to somehow think about how we 
> migrate all this existing data in different folder structures to new 
> consistent folder structure, but this seems quite involved and would likely 
> deserve it's own JIRA at least.
> I have a patch locally against trunk for the first approach, just storing the 
> tableId in the manifest.json, which does this and will run it against CI.
> I'll have a further think about if there are any other approaches, if anyone 
> has any ideas let me know.
> Another thing to consider is where we should apply this change.
> Probably at a minimum 5.0, since that's where one can no longer nodetool 
> clearsnapshot on certain tables and the effect is a bit worse there than in 
> 5.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to