Re: [DISCUSS] inotify for detection of manually removed snapshots

2024-08-12 Thread Štefan Miklošovič
I think we might achieve some compromise - it would be cached but when
asked for the existence of a snapshot, it would go to disk and it would
check if manifest.json exists as if it does not then the snapshot is
effectively deleted.

Paulo measured the times in his comment there (1)

listsnapshots_disk: 37 seconds
listsnapshots_cached: 36 milliseconds
listsnapshots_cached_checkexists: 4 seconds

We would go instead with listsnapshots_disk for
listsnapshots_cached_checkexists. Sure, we have the greatest speedup with
listsnapshots_cached but the cost of that is an unsynchronized view of that
when not refreshed. On the other hand, listsnapshots_cached_checkexists,
which holds it in memory but just checks the existence of a manifest, is
still one-magnitude better than what we have now. That is still a great
improvement and I think we could just go with it while everything would
behave the same.

I think the greatest slowdown is caused by repeatedly traversing the whole
directory tree, this is not necessary after we load it at the beginning,
the mere check of manifest existence is just enough. Also, the biggest
performance killer seems to be the computation of true disk size* which
needs to get the size of each file and sum it up. My idea was to also cache
the sizes but I am not sure of that, seems like an overkill to hold it all
in memory just for this purpose.

* true disk size is the size of files which are in the snapshot but they
are not in live data dir. If we take a snapshot, its true disk size is 0
because at the beginning every snapshot file is a hardlink to live data
dir. Over time, when SSTables in live data dir get compacted, the files in
snapshot dir will not have its counterpart in live data dir and true disk
size starts to be non 0 value. We can not "cache" this as this is variable.


(1)
https://issues.apache.org/jira/browse/CASSANDRA-18111?focusedCommentId=17870085&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17870085

On Mon, Aug 12, 2024 at 1:11 AM Josh McKenzie  wrote:

> I lean towards the documentation approach vs complicating the
> implementation.
>
> +1.
>
> My question around the need for the optimization was purely driven by the
> motivation of exploring whether the complexity of caching this data was
> warranted. From talking with Jeremiah a bit offline, sounds like there's
> been some really egregious degradation cases and the minimal memory
> footprint and status quo complexity of implementation is a good enough
> solution.
>
> On Fri, Aug 9, 2024, at 1:38 PM, Jordan West wrote:
>
> I lean towards the documentation approach vs complicating the
> implementation.
>
> For me personally: I regularly use shell commands to operate on snapshots.
> That includes listing them. I probably should use nodetool for it all
> instead though.
>
> Jordan
>
> On Fri, Aug 9, 2024 at 08:09 Štefan Miklošovič 
> wrote:
>
> I understand and agree. It is just that it would be cool if we avoided the
> situation when there is a figurative ABC company which has these "bash
> scripts removing snapshots from cron by rm -rf every second Sunday at 3:00
> am" because "that was their workflow for ages".
>
> I am particularly sensitive to this as Cassandra is very cautious when it
> comes to not disrupting the workflows already out there.
>
> I do not know how frequent this would be and if somebody started to
> complain. I mean ... they could still remove it by hand, right? It is just
> listsnapshots would not be relevant anymore without refreshing it. I think
> that might be acceptable. It would be something else if we flat out made
> manual deletion forbidden, which it is not.
>
> On Fri, Aug 9, 2024 at 4:50 PM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> If we have the documentation in place, we can then consider the cache to
> be the master copy of metadata, and rely on it to be always accurate and up
> to date. If someone deletes the snapshot files from filesystem, they can't
> complain about Cassandra stopped working correctly - which is the same if
> they had manually deleted some SSTable files (they shouldn't).
> On 09/08/2024 11:16, Štefan Miklošovič wrote:
>
> We could indeed do that. Does your suggestion mean that there should not
> be a problem with caching it all once explicitly stated like that?
>
> On Fri, Aug 9, 2024 at 12:01 PM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
> Has anyone considered simply updating the documentation saying this?
>
> "Removing the snapshot files directly from the filesystem may break
> things. Always use the `nodetool` command or JMX to remove snapshots."
> On 09/08/2024 09:18, Štefan Miklošovič wrote:
>
> If we consider caching it all to be too much, we might probably make
> caching an option an admin would need to opt-in into? There might be a flag
> in cassandra.yaml, once enabled, it would be in memory, otherwise it would
> just load it as it was so people can decide if caching is enough for them
> or t

Re: [VOTE] Release Apache Cassandra 4.1.6

2024-08-12 Thread Jon Haddad
My preference is a NEWS.txt update and we reroll.

On Fri, Aug 2, 2024 at 3:11 PM Brandon Williams  wrote:

> So where does this leave us with this release?  Get Alex's patch
> committed and reroll, reroll with a NEWS entry, or... ?
>
> Kind Regards,
> Brandon
>
> On Thu, Aug 1, 2024 at 9:04 AM Tommy Stendahl via dev
>  wrote:
> >
> > Hi,
> >
> > First, thanks everyone for considering this. I did not expect such a big
> discussion form this, for me it was not such a big thing and a think
> CASSANDRA-19534 is a very good improvement.
> >
> > If I have to recompile I would also update the code so I'm not sure I
> see much benefit with compile time compatibility. What made me react and
> raise the issue was the complete surprise that this would fail when I
> upgraded my test cluster to 4.1.6. My expectation was that a change like
> this would have been discussed or at least mentioned on the ML or Slack but
> I can't remember seeing anything. A note in the NEWS-file would also have
> made aware, I wouldn't have been super happy but I would have know what to
> expect and what I had to do.
> >
> > ecaudit is one thing we do that use the QueryHandler interface but for
> our internal we also use have a few implementations for query
> tracing/logging and prioritize requests. I would say that the QueryHandler
> interface together with the custom payload feature in the native protocol
> is a powerful combination and I would not be surprised if this is used more
> then you might expect.
> >
> > -Original Message-
> > From: J. D. Jordan 
> > Reply-To: dev@cassandra.apache.org
> > To: dev@cassandra.apache.org
> > Subject: Re: [VOTE] Release Apache Cassandra 4.1.6
> > Date: Thu, 01 Aug 2024 08:26:42 -0500
> >
> > @Tommy do you think?  You brought the issue up, I am assuming because
> you found the issue while trying to test ecaudit against the proposed
> release and it broke the integration?
> > As an active consumer of the interface what are your thoughts?
> >
> > On Aug 1, 2024, at 8:17 AM, Alex Petrov  wrote:
> >
> > 
> > > If we have a path that resolves the issue and also maintains full
> compatibility for this (semi- / reluctantly-accessible) interface, that
> would seem ideal. Interested to learn more about the drawbacks to that
> approach.
> >
> > My thinking here was that people who might have a binary dependency on
> this interface have to recompile their code, they may as well change 2
> lines by adding a call to from the new method with
> `requestTime.startedAtNanos()`. I am not in a strong opposition to merging
> it though. If there is general agreement that this is the best way, let's
> do this: I do not see any drawbacks in terms of performance or otherwise.
> >
> > If we decide to move forward with, it, the patch is up [1].
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-19811
> >
> > On Wed, Jul 31, 2024, at 11:24 PM, C. Scott Andreas wrote:
> >
> > Sorry to veer off from a vote in a vote thread.
> >
> > @Alex, can you say more about this statement:
> >
> > > "I think I would prefer to not introduce the change I have proposed
> (the one that would bring back non-binary compatibility)."
> >
> > If we have a path that resolves the issue and also maintains full
> compatibility for this (semi- / reluctantly-accessible) interface, that
> would seem ideal. Interested to learn more about the drawbacks to that
> approach.
> >
> > Regarding the value of C-19534 I'm happy to attest to the fact that it
> addresses severe metastable failure modes in clusters under heavy traffic
> on the verge of tipping. Jon Haddad's independent testing validated this as
> discussed on the ticket as well:
> https://issues.apache.org/jira/browse/CASSANDRA-19534
> >
> > Last, @Tommy this is a great catch and I'm glad you raised it. Thanks
> for watching so closely and appreciate you bringing it to everyone's
> attention.
> >
> > – Scott
> >
> > On Jul 31, 2024, at 1:05 PM, Caleb Rackliffe 
> wrote:
> >
> >
> > +1 to proceeding with a simple upgrade note in NEWS
> >
> > On Wed, Jul 31, 2024 at 12:50 PM Josh McKenzie 
> wrote:
> >
> >
> > Unfortunately, I can not immediately see a good way to provide the
> critical bugfix of CASSANDRA-19534, affecting all Cassandra users, without
> making at least some change in this API.
> >
> > I personally think that this method is very tightly coupled to the
> implementation to expose it via -D. If anyone using it could provide some
> context about why it is an important part of API, it would be give some
> useful context.
> >
> > Nobody stepping up to engage on the technical piece of this? Unless /
> until somebody does, Alex' argument holds the most weight as the expert
> with what's going on IMO.
> >
> > The question we're facing is - when we find a defect that requires a
> change in a public facing API, which of the following 2 is more important:
> >
> > Keeping the API stable
> > Having the defect resolved
> >
> > Obviously this will be case-by-case. What CASSANDRA-19534 ad