Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2022-12-21 Thread Benedict
I’m personally very excited by this work. Compaction could do with a spring clean and this feels to formalise things much more cleanly, but density tiering in particular is something I’ve wanted to incorporate for years now, as it should significantly improve STCS behaviour (most importantly reducing read amplification and the amount of disk space required, narrowing the performance delta to LCS in these important dimensions), and simplifies re-levelling of LCS, making large streams much less painful.On 21 Dec 2022, at 07:19, Henrik Ingo  wrote:I noticed the CEP doesn't link to this, so it should be worth mentioning that the UCS documentation is available here: https://github.com/datastax/cassandra/blob/ds-trunk/doc/unified_compaction.mdBoth of the above seem to do a poor job referencing the literature we've been inspired by. I will link to Mark Callaghan's blog on the subject:http://smalldatum.blogspot.com/2018/07/tiered-or-leveled-compaction-why-not.html?m=1...and lazily will also borrow from Mark a post that references a bunch of LSM (not just UCS related) academic papers: http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html?m=1Finally, it's perhaps worth mentioning that UCS has been in production in our Astra Serverless cloud service since it was launched in March 2021. The version described by the CEP therefore already incorporates some improvements based on observed production behaviour.Henrik On Mon, 19 Dec 2022, 15:41 Branimir Lambov,  wrote:Hello everyone,I would like to open the discussion on our proposal for a unified compaction strategy that aims to solve well-known problems with compaction and improve parallelism to permit higher levels of sustained write throughput.The proposal is here: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+StrategyThe strategy is based on two main observations:- that tiered and levelled compaction can be generalized as the same thing if one observes that both form exponentially-growing levels based on the size of sstables (or non-overlapping sstable runs) and trigger a compaction when more than a given number of sstables are present on one level;- that instead of "size" in the description above we can use "density", i.e. the size of an sstable divided by the width of the token range it covers, which permits sstables to be split at arbitrary points when the output of a compaction is written and still produce a levelled hierarchy.The latter allows us to shard the compaction space into progressively higher numbers of shards as data moves to the higher levels of the hierarchy, improving parallelism, space requirements and the duration of compactions, and the former allows us to cover the existing strategies, as well as hybrid mixtures that can prove more efficient for some workloads.Thank you,Branimir



Re: Issue when creating a stream session

2022-12-21 Thread Aleksey Yeshchenko
Indeed, it is safe to clean up now.

Normally we prefer to do cleanups like that on trunk only, but I’d say feel 
free to remove this dead code in 4.0+ if that’s the branch you are targeting in 
CASSANDRA-17199.

> On 19 Dec 2022, at 03:53, Yuki Morishita  wrote:
> 
> I dug the git history and it looks like CASSANDRA-3569 
>  changed to not 
> register to Gossip.
> https://github.com/apache/cassandra/commit/0f2d7d0b9540efa3ea3dfe4f8270c3635afdc63c
> (It removed the `register` part, not the `unregister` part though.)
> 
> Since StreamSession is no longer registered to nor unregistered from 
> Gosipper, I say the current "implements" code
> is a dead code and we can remove it safely.
> 
> 
> On Sat, Dec 17, 2022 at 4:57 AM David Capwell  > wrote:
>> This sounds like a bug to me, but would be good to get feedback from others 
>> who have touched Streaming…
>> 
>> Repair will fail if membership notifies that a participate node was removed, 
>> so I think it makes sense for Streaming to also follow this behavior.
>> 
>>> On Dec 14, 2022, at 1:22 PM, Natnael Adere >> > wrote:
>>> 
>>> To give more context, StreamSession is not listening to Gossip for 
>>> membership changes. Although it implements an interface for listening to 
>>> membership changes, we do not register with Gossip and, therefore, never 
>>> get these changes. This results in the IEndpointStateChangeSubscriber 
>>> interface being dead code. My question is wether or not this is a bug to be 
>>> fixed or code to delete. Currently, streaming does not fail on membership 
>>> changes because of this problem. If Gossip says that a node was removed or 
>>> restarted, should we fail the stream or not?
>>> 
>>> Thanks,
>>> Natnael
 On Dec 14, 2022, at 1:39 PM, Natnael Adere >>> > wrote:
 
 Hello,
 
 I am working on CASSANDRA-17199 
  and testing for 
 this ticket has uncovered some issues with streaming. When creating a 
 StreamSession we have an intent for to listen but that never happens. My 
 concern is wether or not we should we listen and make sure the interface 
 it implements is not dead code or delete the interface and all of its 
 methods. Our style guide requires no dead code so this might be 
 intentional, but when testing we see that StreamSession not listening. If 
 it is intentional then I suppose we make sure that we are listening. Any 
 opinions on what to do in this scenario?
 
 
 (PROBLEM) The project compiles in both scenarios:
 public class StreamSession implements IEndpointStateChangeSubscriber
 
 public class StreamSession //implements IEndpointStateChangeSubscriber
 
 
 Thanks,
 
 Natnael
>>> 
>>