[DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Štefan Miklošovič
Snapshot metadata are currently stored in memory / they are cached so we do
not need to go to disk every single time we want to list them, the more
snapshots we have, the worse it is.

When a snapshot is _manually_ removed from disk, not from nodetool
clearsnapshot, just by rm -rf on a respective snapshot directory, then such
snapshot will be still visible in nodetool listsnapshots. Manual removal of
a snapshot might be done e.g. by accident or by some "impatient" operator
who just goes to disk and removes it there instead of using nodetool or
respective JMX method.

To improve UX here, what I came up with is that we might use Java's
WatchService where each snapshot dir would be registered. WatchService is
part of Java, it uses inotify subsystem which is what Linux kernel offers.
The result of doing it is that once a snapshot dir is registered to be
watched and when it is removed then we are notified about that via inotify
/ WatchService so we can react on it and remove the in-memory
representation of that so it will not be visible in the output anymore.

While this works, there are some questions / concerns

1) What do people think about inotify in general? I tested this on 10k
snapshots and it seems to work just fine, nevertheless there is in general
no strong guarantee that every single event will come through, there is
also a family of kernel parameters around this where more tuning can be
done etc. It is also questionable how this will behave on other systems
from Linux (Mac etc). While JRE running on different platforms also
implements this, I am not completely sure these implementations are
quality-wise the same as for Linux etc. There is a history of
not-so-quality implementations for other systems (events not coming through
on Macs etc) and while I think we are safe on Linux, I am not sure we want
to go with this elsewhere.

2) inotify brings more entropy into the codebase, it is another thing we
need to take care of etc (however, it is all concentrated in one class and
pretty much "isolated" from everything else)

I made this feature optional and it is turned off by default so people need
to explicitly opt-in into this so we are not forcing it on anybody.

If we do not want to go with inotify, another option would be to have a
background thread which would periodically check if a manifest exists on a
disk, if it does not, then a snapshot does not either. While this works,
what I do not like about this is that the primary reason we moved it to
memory was to bypass IO as much as possible yet here we would introduce
another check which would go to disk, and this would be done periodically,
which beats the whole purpose. If an operator lists snapshots once a week
and there is a background check running every 10 minutes (for example),
then the cummulative number of IO operations migth be bigger than us just
doing nothing at all. For this reason, if we do not want to go with
inotify, I would also not implement any automatic background check. Instead
of that, there would be SnapshotManagerMbean#refresh() method which would
just forcibly reload all snapshots from scratch. I think that manual
deletion of snapshots is not something a user would do regularly, snapshots
are meant to be removed via nodetool or JMX. If manual removal ever happens
then in order to make it synchronized again, the refreshing of these
snapshots would be required. There might be an additional flag in nodetool
listsnapshots, once specified, refreshing would be done, otherwise not.

How does this all sound to people?

Regards


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Josh McKenzie
> Snapshot metadata are currently stored in memory / they are cached so we do 
> not need to go to disk every single time we want to list them, the more 
> snapshots we have, the worse it is.
Are we enumerating our snapshots somewhere on the hot path, or is this 
performance concern misplaced?

On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
> Snapshot metadata are currently stored in memory / they are cached so we do 
> not need to go to disk every single time we want to list them, the more 
> snapshots we have, the worse it is.
> 
> When a snapshot is _manually_ removed from disk, not from nodetool 
> clearsnapshot, just by rm -rf on a respective snapshot directory, then such 
> snapshot will be still visible in nodetool listsnapshots. Manual removal of a 
> snapshot might be done e.g. by accident or by some "impatient" operator who 
> just goes to disk and removes it there instead of using nodetool or 
> respective JMX method.
> 
> To improve UX here, what I came up with is that we might use Java's 
> WatchService where each snapshot dir would be registered. WatchService is 
> part of Java, it uses inotify subsystem which is what Linux kernel offers. 
> The result of doing it is that once a snapshot dir is registered to be 
> watched and when it is removed then we are notified about that via inotify / 
> WatchService so we can react on it and remove the in-memory representation of 
> that so it will not be visible in the output anymore.
> 
> While this works, there are some questions / concerns
> 
> 1) What do people think about inotify in general? I tested this on 10k 
> snapshots and it seems to work just fine, nevertheless there is in general no 
> strong guarantee that every single event will come through, there is also a 
> family of kernel parameters around this where more tuning can be done etc. It 
> is also questionable how this will behave on other systems from Linux (Mac 
> etc). While JRE running on different platforms also implements this, I am not 
> completely sure these implementations are quality-wise the same as for Linux 
> etc. There is a history of not-so-quality implementations for other systems 
> (events not coming through on Macs etc) and while I think we are safe on 
> Linux, I am not sure we want to go with this elsewhere.
> 
> 2) inotify brings more entropy into the codebase, it is another thing we need 
> to take care of etc (however, it is all concentrated in one class and pretty 
> much "isolated" from everything else)
> 
> I made this feature optional and it is turned off by default so people need 
> to explicitly opt-in into this so we are not forcing it on anybody.
> 
> If we do not want to go with inotify, another option would be to have a 
> background thread which would periodically check if a manifest exists on a 
> disk, if it does not, then a snapshot does not either. While this works, what 
> I do not like about this is that the primary reason we moved it to memory was 
> to bypass IO as much as possible yet here we would introduce another check 
> which would go to disk, and this would be done periodically, which beats the 
> whole purpose. If an operator lists snapshots once a week and there is a 
> background check running every 10 minutes (for example), then the cummulative 
> number of IO operations migth be bigger than us just doing nothing at all. 
> For this reason, if we do not want to go with inotify, I would also not 
> implement any automatic background check. Instead of that, there would be 
> SnapshotManagerMbean#refresh() method which would just forcibly reload all 
> snapshots from scratch. I think that manual deletion of snapshots is not 
> something a user would do regularly, snapshots are meant to be removed via 
> nodetool or JMX. If manual removal ever happens then in order to make it 
> synchronized again, the refreshing of these snapshots would be required. 
> There might be an additional flag in nodetool listsnapshots, once specified, 
> refreshing would be done, otherwise not.
> 
> How does this all sound to people?
> 
> Regards


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread J. D. Jordan
I would go with the mbean change option. I would add a new “list” function with 
a new parameter to the mbean that allows specifying if it should refresh the 
cache before returning the list.
No need to do the inotify stuff just for nodetool listsnapshot to be always 
correct. Then add a new parameter to nodetool listsnapshot to set refresh to 
true.

-Jeremiah

> On Aug 7, 2024, at 6:45 AM, Štefan Miklošovič  wrote:
> 
> 
> Snapshot metadata are currently stored in memory / they are cached so we do 
> not need to go to disk every single time we want to list them, the more 
> snapshots we have, the worse it is.
> 
> When a snapshot is _manually_ removed from disk, not from nodetool 
> clearsnapshot, just by rm -rf on a respective snapshot directory, then such 
> snapshot will be still visible in nodetool listsnapshots. Manual removal of a 
> snapshot might be done e.g. by accident or by some "impatient" operator who 
> just goes to disk and removes it there instead of using nodetool or 
> respective JMX method.
> 
> To improve UX here, what I came up with is that we might use Java's 
> WatchService where each snapshot dir would be registered. WatchService is 
> part of Java, it uses inotify subsystem which is what Linux kernel offers. 
> The result of doing it is that once a snapshot dir is registered to be 
> watched and when it is removed then we are notified about that via inotify / 
> WatchService so we can react on it and remove the in-memory representation of 
> that so it will not be visible in the output anymore.
> 
> While this works, there are some questions / concerns
> 
> 1) What do people think about inotify in general? I tested this on 10k 
> snapshots and it seems to work just fine, nevertheless there is in general no 
> strong guarantee that every single event will come through, there is also a 
> family of kernel parameters around this where more tuning can be done etc. It 
> is also questionable how this will behave on other systems from Linux (Mac 
> etc). While JRE running on different platforms also implements this, I am not 
> completely sure these implementations are quality-wise the same as for Linux 
> etc. There is a history of not-so-quality implementations for other systems 
> (events not coming through on Macs etc) and while I think we are safe on 
> Linux, I am not sure we want to go with this elsewhere.
> 
> 2) inotify brings more entropy into the codebase, it is another thing we need 
> to take care of etc (however, it is all concentrated in one class and pretty 
> much "isolated" from everything else)
> 
> I made this feature optional and it is turned off by default so people need 
> to explicitly opt-in into this so we are not forcing it on anybody.
> 
> If we do not want to go with inotify, another option would be to have a 
> background thread which would periodically check if a manifest exists on a 
> disk, if it does not, then a snapshot does not either. While this works, what 
> I do not like about this is that the primary reason we moved it to memory was 
> to bypass IO as much as possible yet here we would introduce another check 
> which would go to disk, and this would be done periodically, which beats the 
> whole purpose. If an operator lists snapshots once a week and there is a 
> background check running every 10 minutes (for example), then the cummulative 
> number of IO operations migth be bigger than us just doing nothing at all. 
> For this reason, if we do not want to go with inotify, I would also not 
> implement any automatic background check. Instead of that, there would be 
> SnapshotManagerMbean#refresh() method which would just forcibly reload all 
> snapshots from scratch. I think that manual deletion of snapshots is not 
> something a user would do regularly, snapshots are meant to be removed via 
> nodetool or JMX. If manual removal ever happens then in order to make it 
> synchronized again, the refreshing of these snapshots would be required. 
> There might be an additional flag in nodetool listsnapshots, once specified, 
> refreshing would be done, otherwise not.
> 
> How does this all sound to people?
> 
> Regards


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread J. D. Jordan
If you have a lot of snapshots and have for example a metric monitoring them 
and their sizes, if you don’t cache it, creating the metric can cause 
performance degradation. We added the cache because we saw this happen to 
databases more than once.

> On Aug 7, 2024, at 7:54 AM, Josh McKenzie  wrote:
> 
> 
>> 
>> Snapshot metadata are currently stored in memory / they are cached so we do 
>> not need to go to disk every single time we want to list them, the more 
>> snapshots we have, the worse it is.
> Are we enumerating our snapshots somewhere on the hot path, or is this 
> performance concern misplaced?
> 
>> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>> Snapshot metadata are currently stored in memory / they are cached so we do 
>> not need to go to disk every single time we want to list them, the more 
>> snapshots we have, the worse it is.
>> 
>> When a snapshot is _manually_ removed from disk, not from nodetool 
>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such 
>> snapshot will be still visible in nodetool listsnapshots. Manual removal of 
>> a snapshot might be done e.g. by accident or by some "impatient" operator 
>> who just goes to disk and removes it there instead of using nodetool or 
>> respective JMX method.
>> 
>> To improve UX here, what I came up with is that we might use Java's 
>> WatchService where each snapshot dir would be registered. WatchService is 
>> part of Java, it uses inotify subsystem which is what Linux kernel offers. 
>> The result of doing it is that once a snapshot dir is registered to be 
>> watched and when it is removed then we are notified about that via inotify / 
>> WatchService so we can react on it and remove the in-memory representation 
>> of that so it will not be visible in the output anymore.
>> 
>> While this works, there are some questions / concerns
>> 
>> 1) What do people think about inotify in general? I tested this on 10k 
>> snapshots and it seems to work just fine, nevertheless there is in general 
>> no strong guarantee that every single event will come through, there is also 
>> a family of kernel parameters around this where more tuning can be done etc. 
>> It is also questionable how this will behave on other systems from Linux 
>> (Mac etc). While JRE running on different platforms also implements this, I 
>> am not completely sure these implementations are quality-wise the same as 
>> for Linux etc. There is a history of not-so-quality implementations for 
>> other systems (events not coming through on Macs etc) and while I think we 
>> are safe on Linux, I am not sure we want to go with this elsewhere.
>> 
>> 2) inotify brings more entropy into the codebase, it is another thing we 
>> need to take care of etc (however, it is all concentrated in one class and 
>> pretty much "isolated" from everything else)
>> 
>> I made this feature optional and it is turned off by default so people need 
>> to explicitly opt-in into this so we are not forcing it on anybody.
>> 
>> If we do not want to go with inotify, another option would be to have a 
>> background thread which would periodically check if a manifest exists on a 
>> disk, if it does not, then a snapshot does not either. While this works, 
>> what I do not like about this is that the primary reason we moved it to 
>> memory was to bypass IO as much as possible yet here we would introduce 
>> another check which would go to disk, and this would be done periodically, 
>> which beats the whole purpose. If an operator lists snapshots once a week 
>> and there is a background check running every 10 minutes (for example), then 
>> the cummulative number of IO operations migth be bigger than us just doing 
>> nothing at all. For this reason, if we do not want to go with inotify, I 
>> would also not implement any automatic background check. Instead of that, 
>> there would be SnapshotManagerMbean#refresh() method which would just 
>> forcibly reload all snapshots from scratch. I think that manual deletion of 
>> snapshots is not something a user would do regularly, snapshots are meant to 
>> be removed via nodetool or JMX. If manual removal ever happens then in order 
>> to make it synchronized again, the refreshing of these snapshots would be 
>> required. There might be an additional flag in nodetool listsnapshots, once 
>> specified, refreshing would be done, otherwise not.
>> 
>> How does this all sound to people?
>> 
>> Regards
> 


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Štefan Miklošovič
Yes, for example as reported here

https://issues.apache.org/jira/browse/CASSANDRA-13338

People who are charting this in monitoring dashboards might also hit this.

On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan 
wrote:

> If you have a lot of snapshots and have for example a metric monitoring
> them and their sizes, if you don’t cache it, creating the metric can cause
> performance degradation. We added the cache because we saw this happen to
> databases more than once.
>
> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie  wrote:
> >
> > 
> >>
> >> Snapshot metadata are currently stored in memory / they are cached so
> we do not need to go to disk every single time we want to list them, the
> more snapshots we have, the worse it is.
> > Are we enumerating our snapshots somewhere on the hot path, or is this
> performance concern misplaced?
> >
> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
> >> Snapshot metadata are currently stored in memory / they are cached so
> we do not need to go to disk every single time we want to list them, the
> more snapshots we have, the worse it is.
> >>
> >> When a snapshot is _manually_ removed from disk, not from nodetool
> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
> snapshot will be still visible in nodetool listsnapshots. Manual removal of
> a snapshot might be done e.g. by accident or by some "impatient" operator
> who just goes to disk and removes it there instead of using nodetool or
> respective JMX method.
> >>
> >> To improve UX here, what I came up with is that we might use Java's
> WatchService where each snapshot dir would be registered. WatchService is
> part of Java, it uses inotify subsystem which is what Linux kernel offers.
> The result of doing it is that once a snapshot dir is registered to be
> watched and when it is removed then we are notified about that via inotify
> / WatchService so we can react on it and remove the in-memory
> representation of that so it will not be visible in the output anymore.
> >>
> >> While this works, there are some questions / concerns
> >>
> >> 1) What do people think about inotify in general? I tested this on 10k
> snapshots and it seems to work just fine, nevertheless there is in general
> no strong guarantee that every single event will come through, there is
> also a family of kernel parameters around this where more tuning can be
> done etc. It is also questionable how this will behave on other systems
> from Linux (Mac etc). While JRE running on different platforms also
> implements this, I am not completely sure these implementations are
> quality-wise the same as for Linux etc. There is a history of
> not-so-quality implementations for other systems (events not coming through
> on Macs etc) and while I think we are safe on Linux, I am not sure we want
> to go with this elsewhere.
> >>
> >> 2) inotify brings more entropy into the codebase, it is another thing
> we need to take care of etc (however, it is all concentrated in one class
> and pretty much "isolated" from everything else)
> >>
> >> I made this feature optional and it is turned off by default so people
> need to explicitly opt-in into this so we are not forcing it on anybody.
> >>
> >> If we do not want to go with inotify, another option would be to have a
> background thread which would periodically check if a manifest exists on a
> disk, if it does not, then a snapshot does not either. While this works,
> what I do not like about this is that the primary reason we moved it to
> memory was to bypass IO as much as possible yet here we would introduce
> another check which would go to disk, and this would be done periodically,
> which beats the whole purpose. If an operator lists snapshots once a week
> and there is a background check running every 10 minutes (for example),
> then the cummulative number of IO operations migth be bigger than us just
> doing nothing at all. For this reason, if we do not want to go with
> inotify, I would also not implement any automatic background check. Instead
> of that, there would be SnapshotManagerMbean#refresh() method which would
> just forcibly reload all snapshots from scratch. I think that manual
> deletion of snapshots is not something a user would do regularly, snapshots
> are meant to be removed via nodetool or JMX. If manual removal ever happens
> then in order to make it synchronized again, the refreshing of these
> snapshots would be required. There might be an additional flag in nodetool
> listsnapshots, once specified, refreshing would be done, otherwise not.
> >>
> >> How does this all sound to people?
> >>
> >> Regards
> >
>


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Yifan Cai
With WatcherService, when events are missed (which is to be expected), you
will still need to list the files. It seems to me that WatcherService
doesn't offer significant benefits in this case.

Regarding listing directory with a refresh flag, my concern is the
potential for abuse. End-users might/could always refresh before listing,
which could undermine the purpose of caching. Perhaps Jeremiah can provide
more insight on this.

IMO, caching is best handled internally. I have a few UX-related questions:
- Is it valid or acceptable to return stale data? If so, end-users have to
do some form of validation before consuming each snapshot to account for
potential deletions.
- Even if listsnapshot returns the most recent data, is it possible that
some of the directories get deleted when end-users are accessing them? I
think it is true. It, then, enforces end-users to do some validation first,
similar to handling stale data.

Just my 2 cents.

- Yifan

On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič 
wrote:

> Yes, for example as reported here
>
> https://issues.apache.org/jira/browse/CASSANDRA-13338
>
> People who are charting this in monitoring dashboards might also hit this.
>
> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan 
> wrote:
>
>> If you have a lot of snapshots and have for example a metric monitoring
>> them and their sizes, if you don’t cache it, creating the metric can cause
>> performance degradation. We added the cache because we saw this happen to
>> databases more than once.
>>
>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie  wrote:
>> >
>> > 
>> >>
>> >> Snapshot metadata are currently stored in memory / they are cached so
>> we do not need to go to disk every single time we want to list them, the
>> more snapshots we have, the worse it is.
>> > Are we enumerating our snapshots somewhere on the hot path, or is this
>> performance concern misplaced?
>> >
>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>> >> Snapshot metadata are currently stored in memory / they are cached so
>> we do not need to go to disk every single time we want to list them, the
>> more snapshots we have, the worse it is.
>> >>
>> >> When a snapshot is _manually_ removed from disk, not from nodetool
>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
>> snapshot will be still visible in nodetool listsnapshots. Manual removal of
>> a snapshot might be done e.g. by accident or by some "impatient" operator
>> who just goes to disk and removes it there instead of using nodetool or
>> respective JMX method.
>> >>
>> >> To improve UX here, what I came up with is that we might use Java's
>> WatchService where each snapshot dir would be registered. WatchService is
>> part of Java, it uses inotify subsystem which is what Linux kernel offers.
>> The result of doing it is that once a snapshot dir is registered to be
>> watched and when it is removed then we are notified about that via inotify
>> / WatchService so we can react on it and remove the in-memory
>> representation of that so it will not be visible in the output anymore.
>> >>
>> >> While this works, there are some questions / concerns
>> >>
>> >> 1) What do people think about inotify in general? I tested this on 10k
>> snapshots and it seems to work just fine, nevertheless there is in general
>> no strong guarantee that every single event will come through, there is
>> also a family of kernel parameters around this where more tuning can be
>> done etc. It is also questionable how this will behave on other systems
>> from Linux (Mac etc). While JRE running on different platforms also
>> implements this, I am not completely sure these implementations are
>> quality-wise the same as for Linux etc. There is a history of
>> not-so-quality implementations for other systems (events not coming through
>> on Macs etc) and while I think we are safe on Linux, I am not sure we want
>> to go with this elsewhere.
>> >>
>> >> 2) inotify brings more entropy into the codebase, it is another thing
>> we need to take care of etc (however, it is all concentrated in one class
>> and pretty much "isolated" from everything else)
>> >>
>> >> I made this feature optional and it is turned off by default so people
>> need to explicitly opt-in into this so we are not forcing it on anybody.
>> >>
>> >> If we do not want to go with inotify, another option would be to have
>> a background thread which would periodically check if a manifest exists on
>> a disk, if it does not, then a snapshot does not either. While this works,
>> what I do not like about this is that the primary reason we moved it to
>> memory was to bypass IO as much as possible yet here we would introduce
>> another check which would go to disk, and this would be done periodically,
>> which beats the whole purpose. If an operator lists snapshots once a week
>> and there is a background check running every 10 minutes (for example),
>> then the cummulative number of IO operations migth be bigger t

Re: [DISCUSS] state of lz4-java

2024-08-07 Thread Mick Semb Wever
reply below.


- since the project is Apache licensed we could fork / adopt for our needs?
> I’m not familiar enough with the hurdles we’d have to surmount around the
> licensing but at least it’s a compatible license. Maybe Mick knows more
> about what would be required for this route?
>


It's "Copyright 2020 Adrien Grand and the lz4-java contributors", one of
those "get approval from everyone" headaches, if we wanted to do it the
ideal way.  But there are other ways to make it work, so I wouldn't not
consider this a blocker nor any significant factor to what we decide is the
right approach for us.


Re: [VOTE] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-08-07 Thread Dinesh Joshi
+1

I think this is a low risk change which is going to avoid duplication of
effort so it is worth it.

On Sat, Aug 3, 2024 at 11:19 PM Yifan Cai  wrote:

> Hi,
>
> I am proposing backporting CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0.
>
> There is a discussion thread
>  on the
> topic. In summary, the backport would benefit Cassandra Analytics by
> providing a unified solution, and the patch is considered low-risk. While
> there are concerns about adding features to 4.0 and 4.1, there is generally
> support for 5.0.
>
> The vote will be open for 72 hours (longer if needed). Votes by PMC
> members are considered binding. A vote passes if there are at least three
> binding +1s and no -1's.
>
> Kind regards,
> Yifan Cai
>


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Štefan Miklošovič
On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai  wrote:

> With WatcherService, when events are missed (which is to be expected), you
> will still need to list the files. It seems to me that WatcherService
> doesn't offer significant benefits in this case.
>

Yeah I think we leave it out eventually.

Regarding listing directory with a refresh flag, my concern is the
> potential for abuse. End-users might/could always refresh before listing,
> which could undermine the purpose of caching. Perhaps Jeremiah can provide
> more insight on this.
>

Well, by default, it would not be refreshed every single time. You would
need to opt-in into that. If there is a shop which has users with a direct
access to the disk of Cassandra nodes and they are removing data manually,
I do not know what to say, what is nodetool clearsnapshot and jmx methods
good for then? I do not think we can prevent people from shooting into
their feet if they are absolutely willing to do that.

If they want to refresh that every time, that would be equal to the current
behavior. It would be at most as "bad" as it is now.


> IMO, caching is best handled internally. I have a few UX-related questions:
> - Is it valid or acceptable to return stale data? If so, end-users have to
> do some form of validation before consuming each snapshot to account for
> potential deletions.
>

answer below

- Even if listsnapshot returns the most recent data, is it possible that
> some of the directories get deleted when end-users are accessing them? I
> think it is true. It, then, enforces end-users to do some validation first,
> similar to handling stale data.
>

I think that what you were trying to say is that when at time T0 somebody
lists snapshots and at T1 somebody removes a snapshot manually then the
list of snapshots is not actual anymore? Sure. That is a thing. This is how
it currently works.

Now, we want to cache them, so if you clear a snapshot which is not
physically there because somebody removed it manually, that should be a
no-op, it will be just removed from the internal tracker. So, if it is at
disk and in cache and you clear it, then all is fine. It is fine too if it
is not on disk anymore and you clear it, then it is just removed
internally. It would fail only in case you want to remove a snapshot which
is not cached, regardless whether it is on disk or not. The deletion of
non-existing snapshot ends up with a failure, nothing should be changed in
that regard, this is the current behavior too.

I want to say that I did not write it completely correctly at the very
beginning of this thread. Currently, we are caching only _expiring_
snapshots, because we need to know what is their time of removal so we act
on it later. _normal_ snapshots are _not_ cached _yet_. I spent so much
time with 18111 that I live in a reality where it is already in, I forgot
this is not actually in place yet, we are very close to that.

OK thank you all for your insights, I will NOT use inotify.


> Just my 2 cents.
>
> - Yifan
>
> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič 
> wrote:
>
>> Yes, for example as reported here
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13338
>>
>> People who are charting this in monitoring dashboards might also hit this.
>>
>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan 
>> wrote:
>>
>>> If you have a lot of snapshots and have for example a metric monitoring
>>> them and their sizes, if you don’t cache it, creating the metric can cause
>>> performance degradation. We added the cache because we saw this happen to
>>> databases more than once.
>>>
>>> > On Aug 7, 2024, at 7:54 AM, Josh McKenzie 
>>> wrote:
>>> >
>>> > 
>>> >>
>>> >> Snapshot metadata are currently stored in memory / they are cached so
>>> we do not need to go to disk every single time we want to list them, the
>>> more snapshots we have, the worse it is.
>>> > Are we enumerating our snapshots somewhere on the hot path, or is this
>>> performance concern misplaced?
>>> >
>>> >> On Wed, Aug 7, 2024, at 7:44 AM, Štefan Miklošovič wrote:
>>> >> Snapshot metadata are currently stored in memory / they are cached so
>>> we do not need to go to disk every single time we want to list them, the
>>> more snapshots we have, the worse it is.
>>> >>
>>> >> When a snapshot is _manually_ removed from disk, not from nodetool
>>> clearsnapshot, just by rm -rf on a respective snapshot directory, then such
>>> snapshot will be still visible in nodetool listsnapshots. Manual removal of
>>> a snapshot might be done e.g. by accident or by some "impatient" operator
>>> who just goes to disk and removes it there instead of using nodetool or
>>> respective JMX method.
>>> >>
>>> >> To improve UX here, what I came up with is that we might use Java's
>>> WatchService where each snapshot dir would be registered. WatchService is
>>> part of Java, it uses inotify subsystem which is what Linux kernel offers.
>>> The result of doing it is that once a snapshot dir is registered to be
>>> watched and

Re: [VOTE] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-08-07 Thread Josh McKenzie
+1.

I'm ultimately in favor of setting the precedent of allowing low-risk ecosystem 
interop changes in non-trunk.

On Wed, Aug 7, 2024, at 2:10 PM, Dinesh Joshi wrote:
> +1
> 
> I think this is a low risk change which is going to avoid duplication of 
> effort so it is worth it.
> 
> On Sat, Aug 3, 2024 at 11:19 PM Yifan Cai  wrote:
>> Hi,
>> 
>> I am proposing backporting CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0. 
>> 
>> There is a discussion thread 
>>  on the 
>> topic. In summary, the backport would benefit Cassandra Analytics by 
>> providing a unified solution, and the patch is considered low-risk. While 
>> there are concerns about adding features to 4.0 and 4.1, there is generally 
>> support for 5.0.
>> 
>> The vote will be open for 72 hours (longer if needed). Votes by PMC members 
>> are considered binding. A vote passes if there are at least three binding 
>> +1s and no -1's.
>> 
>> Kind regards,
>> Yifan Cai


Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-07 Thread Josh McKenzie
> If you have a lot of snapshots and have for example a metric monitoring them 
> and their sizes, if you don’t cache it, creating the metric can cause 
> performance degradation. We added the cache because we saw this happen to 
> databases more than once.
I mean, I believe you, I'm just surprised querying out metadata for files and 
basic computation is leading to hundreds of ms pause times even on systems with 
a lot of files. Aren't most / all of these values cached at the filesystem 
layer so we're basically just tomato / tomahto caching systems, either one we 
maintain or one the OS maintains?

Or is there really just a count of files well outside what I'm thinking here?

Anyway, not trying to cause a ruckus and make needless noise, trying to learn. 
;)


On Wed, Aug 7, 2024, at 3:20 PM, Štefan Miklošovič wrote:
> 
> 
> On Wed, Aug 7, 2024 at 6:39 PM Yifan Cai  wrote:
>> With WatcherService, when events are missed (which is to be expected), you 
>> will still need to list the files. It seems to me that WatcherService 
>> doesn't offer significant benefits in this case.
>  
> Yeah I think we leave it out eventually.
> 
>> 
>> Regarding listing directory with a refresh flag, my concern is the potential 
>> for abuse. End-users might/could always refresh before listing, which could 
>> undermine the purpose of caching. Perhaps Jeremiah can provide more insight 
>> on this.
> 
> Well, by default, it would not be refreshed every single time. You would need 
> to opt-in into that. If there is a shop which has users with a direct access 
> to the disk of Cassandra nodes and they are removing data manually, I do not 
> know what to say, what is nodetool clearsnapshot and jmx methods good for 
> then? I do not think we can prevent people from shooting into their feet if 
> they are absolutely willing to do that.
> 
> If they want to refresh that every time, that would be equal to the current 
> behavior. It would be at most as "bad" as it is now.
>  
>> IMO, caching is best handled internally. I have a few UX-related questions:
>> - Is it valid or acceptable to return stale data? If so, end-users have to 
>> do some form of validation before consuming each snapshot to account for 
>> potential deletions. 
> 
> answer below
> 
>> - Even if listsnapshot returns the most recent data, is it possible that 
>> some of the directories get deleted when end-users are accessing them? I 
>> think it is true. It, then, enforces end-users to do some validation first, 
>> similar to handling stale data. 
> 
> I think that what you were trying to say is that when at time T0 somebody 
> lists snapshots and at T1 somebody removes a snapshot manually then the list 
> of snapshots is not actual anymore? Sure. That is a thing. This is how it 
> currently works. 
> 
> Now, we want to cache them, so if you clear a snapshot which is not 
> physically there because somebody removed it manually, that should be a 
> no-op, it will be just removed from the internal tracker. So, if it is at 
> disk and in cache and you clear it, then all is fine. It is fine too if it is 
> not on disk anymore and you clear it, then it is just removed internally. It 
> would fail only in case you want to remove a snapshot which is not cached, 
> regardless whether it is on disk or not. The deletion of non-existing 
> snapshot ends up with a failure, nothing should be changed in that regard, 
> this is the current behavior too.
> 
> I want to say that I did not write it completely correctly at the very 
> beginning of this thread. Currently, we are caching only _expiring_ 
> snapshots, because we need to know what is their time of removal so we act on 
> it later. _normal_ snapshots are _not_ cached _yet_. I spent so much time 
> with 18111 that I live in a reality where it is already in, I forgot this is 
> not actually in place yet, we are very close to that.
>  
> OK thank you all for your insights, I will NOT use inotify.
> 
>> 
>> Just my 2 cents. 
>> 
>> - Yifan
>> 
>> On Wed, Aug 7, 2024 at 6:03 AM Štefan Miklošovič  
>> wrote:
>>> Yes, for example as reported here
>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-13338
>>> 
>>> People who are charting this in monitoring dashboards might also hit this.
>>> 
>>> On Wed, Aug 7, 2024 at 2:59 PM J. D. Jordan  
>>> wrote:
 If you have a lot of snapshots and have for example a metric monitoring 
 them and their sizes, if you don’t cache it, creating the metric can cause 
 performance degradation. We added the cache because we saw this happen to 
 databases more than once.
 
 > On Aug 7, 2024, at 7:54 AM, Josh McKenzie  wrote:
 > 
 > 
 >> 
 >> Snapshot metadata are currently stored in memory / they are cached so 
 >> we do not need to go to disk every single time we want to list them, 
 >> the more snapshots we have, the worse it is.
 > Are we enumerating our snapshots somewhere on the hot path, or is this 
 > performance concern misplaced?
>

Re: [VOTE] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-08-07 Thread Brandon Williams
+1 for reasons stated in the discussion.

Kind Regards,
Brandon

On Sun, Aug 4, 2024 at 1:18 AM Yifan Cai  wrote:
>
> Hi,
>
> I am proposing backporting CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0.
>
> There is a discussion thread on the topic. In summary, the backport would 
> benefit Cassandra Analytics by providing a unified solution, and the patch is 
> considered low-risk. While there are concerns about adding features to 4.0 
> and 4.1, there is generally support for 5.0.
>
> The vote will be open for 72 hours (longer if needed). Votes by PMC members 
> are considered binding. A vote passes if there are at least three binding +1s 
> and no -1's.
>
> Kind regards,
> Yifan Cai