I think we might achieve some compromise - it would be cached but when
asked for the existence of a snapshot, it would go to disk and it would
check if manifest.json exists as if it does not then the snapshot is
effectively deleted.
Paulo measured the times in his comment there (1)
listsnapshots_
> I lean towards the documentation approach vs complicating the implementation.
+1.
My question around the need for the optimization was purely driven by the
motivation of exploring whether the complexity of caching this data was
warranted. From talking with Jeremiah a bit offline, sounds like
I lean towards the documentation approach vs complicating the
implementation.
For me personally: I regularly use shell commands to operate on snapshots.
That includes listing them. I probably should use nodetool for it all
instead though.
Jordan
On Fri, Aug 9, 2024 at 08:09 Štefan Miklošovič
wr
I understand and agree. It is just that it would be cool if we avoided the
situation when there is a figurative ABC company which has these "bash
scripts removing snapshots from cron by rm -rf every second Sunday at 3:00
am" because "that was their workflow for ages".
I am particularly sensitive t
If we have the documentation in place, we can then consider the cache to
be the master copy of metadata, and rely on it to be always accurate and
up to date. If someone deletes the snapshot files from filesystem, they
can't complain about Cassandra stopped working correctly - which is the
same
It is also worth to say that there are plans to include more snapshot
metadata in the future, for example, it would be nice if we had a way to
verify the consistency of a snapshot. That leads to a manifest.json file
which would contain checksums to each file of a snapshot and similar. The
result of
We could indeed do that. Does your suggestion mean that there should not be
a problem with caching it all once explicitly stated like that?
On Fri, Aug 9, 2024 at 12:01 PM Bowen Song via dev
wrote:
> Has anyone considered simply updating the documentation saying this?
>
> "Removing the snapshot
Has anyone considered simply updating the documentation saying this?
"Removing the snapshot files directly from the filesystem may break
things. Always use the `nodetool` command or JMX to remove snapshots."
On 09/08/2024 09:18, Štefan Miklošovič wrote:
If we consider caching it all to be too
If we consider caching it all to be too much, we might probably make
caching an option an admin would need to opt-in into? There might be a flag
in cassandra.yaml, once enabled, it would be in memory, otherwise it would
just load it as it was so people can decide if caching is enough for them
or th
> If you have a lot of snapshots and have for example a metric monitoring them
> and their sizes, if you don’t cache it, creating the metric can cause
> performance degradation. We added the cache because we saw this happen to
> databases more than once.
I mean, I believe you, I'm just surprised
On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai wrote:
> With WatcherService, when events are missed (which is to be expected), you
> will still need to list the files. It seems to me that WatcherService
> doesn't offer significant benefits in this case.
>
Yeah I think we leave it out eventually.
Rega
With WatcherService, when events are missed (which is to be expected), you
will still need to list the files. It seems to me that WatcherService
doesn't offer significant benefits in this case.
Regarding listing directory with a refresh flag, my concern is the
potential for abuse. End-users might/
Yes, for example as reported here
https://issues.apache.org/jira/browse/CASSANDRA-13338
People who are charting this in monitoring dashboards might also hit this.
On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan
wrote:
> If you have a lot of snapshots and have for example a metric monitoring
> them
If you have a lot of snapshots and have for example a metric monitoring them
and their sizes, if you don’t cache it, creating the metric can cause
performance degradation. We added the cache because we saw this happen to
databases more than once.
> On Aug 7, 2024, at 7:54 AM, Josh McKenzie wro
I would go with the mbean change option. I would add a new “list” function with
a new parameter to the mbean that allows specifying if it should refresh the
cache before returning the list.
No need to do the inotify stuff just for nodetool listsnapshot to be always
correct. Then add a new parame
> Snapshot metadata are currently stored in memory / they are cached so we do
> not need to go to disk every single time we want to list them, the more
> snapshots we have, the worse it is.
Are we enumerating our snapshots somewhere on the hot path, or is this
performance concern misplaced?
On
Snapshot metadata are currently stored in memory / they are cached so we do
not need to go to disk every single time we want to list them, the more
snapshots we have, the worse it is.
When a snapshot is _manually_ removed from disk, not from nodetool
clearsnapshot, just by rm -rf on a respective s
17 matches
Mail list logo