Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-02-27 Thread Štefan Miklošovič
Sorry for going silent on this, I was thinking about this more and what Blake suggested, to have incremental backups somehow integrated, resonated with me. I was trying to figure out how this would all work though. For the discussion of scripts vs. no scripts, I just do not see how it would be hel

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-02-04 Thread Jon Haddad
Fwiw, I don't have a problem with using a shell script. In the email I sent, I was trying to illustrate how getting to exploiting a shell vulnerability essentially requires a system that's been completely compromised already, either through JMX or through CQL (assuming we can update configs via

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Štefan Miklošovič
I feel uneasy about executing scripts from Cassandra. Jon was talking about this here (1) as well. I would not base this on any shell scripts / commands executions. I think nothing beats pure Java copying files to a directory ... (1) https://lists.apache.org/thread/jcr3mln2tohbckvr8fjrr0sq0syof080

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Jeremiah Jordan
For commit log archiving we already have the concept of “commands” to be executed. Maybe a similar concept would be useful for snapshots? Maybe a new “user snapshot with command” nodetool action could be added. The server would make its usual hard links inside a snapshot folder and then it could

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Štefan Miklošovič
Interesting, I will need to think about it more. Thanks for chiming in. On Wed, Jan 22, 2025 at 8:10 PM Blake Eggleston wrote: > Somewhat tangential, but I’d like to see Cassandra provide a backup story > that doesn’t involve making copies of sstables. They’re constantly > rewritten by compactio

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-22 Thread Blake Eggleston
Somewhat tangential, but I’d like to see Cassandra provide a backup story that doesn’t involve making copies of sstables. They’re constantly rewritten by compaction, and intelligent backup systems often need to be able to read sstable metadata to optimize storage usage. An interface purpose bui

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-22 Thread Štefan Miklošovič
On Wed, Jan 22, 2025 at 2:21 AM James Berragan wrote: > I think this is an idea worth exploring, my guess is that even if the > scope is confined to just "copy if not exists" it would still largely be > used as a cloud-agnostic backup/restore solution, and so will be shaped > accordingly. > > Som

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-21 Thread James Berragan
I think this is an idea worth exploring, my guess is that even if the scope is confined to just "copy if not exists" it would still largely be used as a cloud-agnostic backup/restore solution, and so will be shaped accordingly. Some thoughts: - I think it would be worth exploring more what the di

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-21 Thread Štefan Miklošovič
If you ask specifically about how TTL snapshots are handled, there is a thread running with a task scheduled every n seconds (not sure what is the default) and it just checks against "expired_at" field in manifest if it is expired or not. If it is then it will proceed to delete it as any other snap

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-21 Thread Štefan Miklošovič
On Tue, Jan 21, 2025 at 5:30 AM Francisco Guerrero wrote: > I think we should evaluate the benefits of the feature you are proposing > independently on how it might be used by Sidecar or other tools. As it > is, it already sounds like a useful functionality to have in the core of > the > Cassandr

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-20 Thread Francisco Guerrero
I think we should evaluate the benefits of the feature you are proposing independently on how it might be used by Sidecar or other tools. As it is, it already sounds like a useful functionality to have in the core of the Cassandra process. Tooling around Cassandra, including Sidecar, can then leve

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-12 Thread Štefan Miklošovič
C) Let's just enable backuping to a local filesystem. To make things simpler and more user-friendly, it would be stored the same way (in the target destination) Sidecar would upload it, so when people decide to start to use Sidecar and incorporate it into their deployments / workflows, these backu

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-12 Thread Jon Haddad
Are you proposing that we manage backups in the DB instead of Sidecar, or that we have the same functionality in both C* proper and the sidecar? Or that we ship C* with backups to a local filesystem only? Where should the line be on what goes into sidecar and what goes into C* proper? Jon On

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-12 Thread Štefan Miklošovič
Oh yeah I knew Sidecar will be mentioned, let's dive into that. Sidecar has a lot of endpoints / functionality, backup / restore is just part of that. What I proposed has also thes advantages: 1) Every time you go to upload to some cloud storage provider, you need to add all the dependencies to

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-12 Thread Jon Haddad
Sound like part of a backup strategy.Probably worth chiming in on the sidecar issue: https://issues.apache.org/jira/browse/CASSSIDECAR-148. IIRC, Medusa and Tablesnap both uploaded a manifest and don't upload multiple copies of the same SSTables. I think this should definitely be part of our

[DISCUSS] Snapshots outside of Cassandra data directory

2025-01-12 Thread Štefan Miklošovič
Hi, I would like to run this through ML to gather feedback as we are contemplating about making this happen. Currently, snapshots are just hardlinks located in a snapshot directory to live data directory. That is super handy as it occupies virtually zero disk space etc (as long as underlying SSTa