Are there still some concerns with the CEP or should we start the vote?
Le ven. 23 juil. 2021 à 15:37, Branimir Lambov <branimir.lam...@datastax.com> a écrit : > > CEP indicates the flushing behavior is suddenly more tied to the Memtable > implementation level rather than being configurable at the table level > > The specific things that change with the proposal are: > - Flushes are supplied with a reason (e.g. memory full, schema change, > prepare > to stream). > - The memtable can reject a flush request. > - The logic to initiate "memory full" and "period expired" flushes moves to > the > memtable where it conceptually belongs. > > Is the latter what worries you? For reusability, the current logic is > extracted > in a base class that the skiplist/trie/7282 implementations derive from. > > > > I'm not sure if the "isDurable" + "shouldSkip" is interesting instead > of "shouldWrite"(etc). But I also wonder in cases where point-in-time > restore > is required how one could achieve it without a commit log(can persistent > memory memtable be rolled back?). > > That's exactly the reason why the two flags are separate. To use PITR, you > use > the commit log but make sure that it does not treat the segments covered by > the > persistent memtable as dirty(i.e. writesAreDurable but not > writesShouldSkipCommitLog); commit log segments are written only to be > archived, and PITR restores a memtable snapshot and applies the mutations > after > it. > > Am I misunderstanding the question? > > > > Although I do feel like persistent memory exceptions make stuff more > complex. > > The persistent memtables were the reason that drove this functionality, but > think about it also as an easy way to do pluggable storage engines. I may > not > be up to date with the consensus in the community on this, but I don't see > us > investing the effort to have fully-fledged pluggable storage engines of the > CASSANDRA-13475 type any time soon. > > To make the memtable a storage engine you need two things: > - an opt out of flushing, so that the memtable is the only component that > serves > reads, > - an opt out of the commit log, so that the memtable is the only component > that > serves writes, > > plus some solutions for the secondary uses of sstables (streaming) and > commit > log (PITR, CDC). > > The proposal gives it that, with a little more control than just opt-out. > It can > work for the pmem (opt out of both) and rocksdb (opt out of flushing only) > use cases, but for me it will also be useful to experiment with a memtable > that > includes its own version of a commit log (opt out of commit log only). > > > On Thu, Jul 22, 2021 at 4:00 PM Michael Burman <y...@iki.fi> wrote: > > > On Wed, 21 Jul 2021 at 17:24, Branimir Lambov < > > branimir.lam...@datastax.com> > > wrote: > > > > > > Why is flushing control bad to do in CFS and better in the > > > memtable? > > > > > > I wonder why you would understand this as something that takes away > > > control instead of giving it. The CFS is not configurable. With the > > > CEP, memtables are configurable at the table level. It is entirely > > > possible to implement a memtable wrapper that provides any of the > > > examples of functionalities you mention -- and that would be fully > > > configurable (just as example, one could very well select a > > > time-series-optimized-flush wrapper over skip-list memtable). > > > > > > > > I think this was a bit of miscommunication. I'm not in favor of keeping > it > > in the CFS, but at least to me (as a reader) CEP indicates the flushing > > behavior is suddenly more tied to the Memtable implementation level > rather > > than being configurable at the table level. Thus that would not reduce > > coupling of different flush strategies, but instead just move it from CFS > > to Memtable-implementation. And especially with multiple Memtable > > implementations that would mean the reusable parts of flushing could end > up > > being difficult to reuse. If not the intention, then good. > > > > > > > > > > This is another question that the proposal leaves to the memtable > > > implementation (or wrapper), but it does make sense to make sure the > > > interfaces provide the necessary support for sharding > > > > > > > + 1 to this, that's a good limitation of scope to get forward. I think > this > > was originally touched in 7282 (where I had it in the memtable impl), but > > then got pushed one step outside. > > > > writesShouldSkipCommitLog is a result of scope reduction (call it > > > laziness on my part). I could not find a way to tell if commit log > > > data may be required for point-in-time-restore or any other feature, > > > and the existing method of turning the commit log off does not have > > > the right granularity. I am very open to suggestions here. > > > > > > > Could this be limited to a single parameter? I'm not sure if the > > "isDurable" + "shouldSkip" is interesting instead of "shouldWrite" (etc). > > But I also wonder in cases where point-in-time restore is required how > one > > could achieve it without a commit log (can persistent memory memtable be > > rolled back?). That does have an effect on backups. I have to read your > > impl how you intended to rewrite the process from Keyspace (where the > > requirement for "isDurable" starts from). > > > > Although I do feel like persistent memory exceptions make stuff more > > complex. > > > > > > > > > > > > > > > > > > > Why is streaming in the memtable? [...] the wanted behavior is just > > > disabling automated flushing > > > > > > Yes, if zero-copy-streaming is not enabled. And that's exactly what > > > this method is there for -- to make sure sstables are not copied > > > whole, and that a flush is not done at the end. > > > > > > Regards, > > > Branimir > > > > > > On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org < > bened...@apache.org > > > > > > wrote: > > > > > > > I would love to help out with this in any way that I can, FYI. > > Definitely > > > > one of the more impactful performance improvements to the codebase, > > given > > > > the benefits to compaction and memory behaviour. > > > > > > > > From: bened...@apache.org <bened...@apache.org> > > > > Date: Wednesday, 21 July 2021 at 14:32 > > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations > > > > > memtable-as-a-commitlog-index > > > > > > > > Heh, based on 7282? Yeah, I’ve had this idea for a while now > (actually > > > > there was a paper that did this a long time ago), and it could be > very > > > nice > > > > (if for no other benefit than reducing heap utilisation). I don’t > think > > > > this requires that they be modelled as the same concept, however, > only > > > that > > > > the Memtable must be able to receive an address into a commit log > entry > > > and > > > > to adopt partial ownership over the entry’s lifecycle. > > > > > > > > > > > > From: Branimir Lambov <branimir.lam...@datastax.com> > > > > Date: Wednesday, 21 July 2021 at 14:28 > > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations > > > > > In general, I think we need to make up our mind as to whether we > > > > consider the Memtable and CommitLog one logical entity [...], or > > > > whether we want to further untangle those two components from an > > > > architectural perspective which we started down that road on with > > > > the pluggable storage engine work. > > > > > > > > This CEP is intentionally not attempting to answer this question. > FWIW > > > > I do not see them as separable (there's evidence to this fact in the > > > > codebase), but there are valid secondary uses of the commit log that > > > > are served well enough by the current architecture. > > > > > > > > It is important, however, to let the memtable implementation opt out, > > > > to permit it to provide its own solution for data persistence. > > > > > > > > We should revisit this in the future, especially if Benedict's shared > > > > log facility and my plans for a memtable-as-a-commitlog-index > > > > evolve. > > > > > > > > Regards, > > > > Branimir > > > > > > > > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman <y...@iki.fi> wrote: > > > > > > > > > Hi, > > > > > > > > > > It is nice to see these going forward (and a great use of CEP) so > > > thanks > > > > > for the proposal. I have my reservations regarding the linking of > > > > memtable > > > > > to CommitLog and flushing and should not leak abstraction from one > to > > > > > another. And I don't see the reasoning why they should be, it > doesn't > > > > seem > > > > > to add anything else than tight coupling of components, reducing > > reuse > > > > and > > > > > making things unnecessarily complicated. Also, the streaming > notions > > > seem > > > > > weird to me - how are they related to memtable? Why should memtable > > > care > > > > > about the behavior outside memtable's responsibility? > > > > > > > > > > Some misc (with some thoughts split / duplicated to different > parts) > > > > quotes > > > > > and comments: > > > > > > > > > > > Tight coupling between CFS and memtable will be reduced: flushing > > > > > functionality is to be extracted, controlling memtable memory and > > > period > > > > > expiration will be handled by the memtable. > > > > > > > > > > Why is flushing control bad to do in CFS and better in the > memtable? > > > > Doing > > > > > it outside memtable would allow to control the flushing regardless > of > > > how > > > > > the actual memtable is implemented. For example, lets say someone > > would > > > > > want to implement the HBase's accordion to Cassandra. It shouldn't > > > matter > > > > > what the implementation of memtable is as the compaction of > different > > > > > memtables could be beneficial to all implementations. Or the > flushing > > > > would > > > > > push the memtable to a proper caching instead of only to disk. > > > > > > > > > > Or if we had per table caching structure, we could control the > > flushing > > > > of > > > > > memtables and the cache structure separately. Some data benefits > from > > > LRU > > > > > and some from MRW (most-recently-written) caching strategies. But > > both > > > > > could benefit from the same memtable implementation, it's the data > > and > > > > how > > > > > its used that could control how the flushing should work. For > example > > > > time > > > > > series data behaves quite differently in terms of data accesses to > > > > > something more "random". > > > > > > > > > > Or even "total memory control" which would check which tables need > > more > > > > > memory to do their writes and which do not. Or that the memory > > doesn't > > > > grow > > > > > over a boundary and needs to manually maintain how much is > dedicated > > to > > > > > caching and how much to memtables waiting to be flushed. Or delay > > > > flushing > > > > > because the disks can't keep up etc. Not to be implemented in this > > CEP, > > > > but > > > > > pushing this strategy to memtable would prevent many features. > > > > > > > > > > > Beyond thread-safety, the concurrency constraints of the memtable > > are > > > > > intentionally left unspecified. > > > > > > > > > > I like this. I could see use-cases where a single-thread > > implementation > > > > > could actually outperform some concurrent data structures. But it > > also > > > > > provides me with a question, is this proposal going to take an > angle > > > > > towards per-range memtables? There are certainly benefits to > > splitting > > > > the > > > > > memtables as it would reduce the "n" in the operations, thus > > providing > > > > less > > > > > overhead in lookups and writes. Although, taking it one step > > backwards > > > I > > > > > could see the benefit of having a commitlog per range also, which > > would > > > > > allow higher utilization of NVME drives with larger queue depths. > And > > > why > > > > > not per-range-sstables for faster scale-outs and .. a bit outside > the > > > > scope > > > > > of CEP, but just to ensure that the implementation does not block > > such > > > > > improvement. > > > > > > > > > > Interfaces: > > > > > > > > > > > boolean writesAreDurable() > > > > > > boolean writesShouldSkipCommitLog() > > > > > > > > > > The placement inside memtable implementation for these methods just > > > feels > > > > > incredibly wrong to me. The writing pipeline should have these > > > configured > > > > > and they could differ for each table even with the same memtable > > > > > implementation. Lets take the example of an in-memory memtable use > > case > > > > > that's never written to a SSTable. We could have one table with > just > > > > simply > > > > > in-memory cached storage and another one with a Redis style > > persistence > > > > of > > > > > AOF, where writes would be written to the commitlog for fast > > recovery, > > > > but > > > > > the data is otherwise always only kept in the memtable instead of > > > writing > > > > > to the SSTable (for performance reasons). Same implementation of > > > memtable > > > > > still. > > > > > > > > > > Why would the write process of the table not ask the table what > > > settings > > > > it > > > > > has and instead asks the memtable what settings the table has? This > > > seems > > > > > counterintuitive to me. Even the persistent memory case is a bit > > > > > questionable, why not simply disable commitlog in the writing > > process? > > > > Why > > > > > ask the memtable? > > > > > > > > > > This feels like memtable is going to be the write pipeline, but to > me > > > > that > > > > > doesn't feel like the correct architectural decision. I'd rather > see > > > > these > > > > > decisions done outside the memtable. Even a persistent memory > > memtable > > > > user > > > > > might want to have a commitlog enabled for data capture / shipping > > > logs, > > > > or > > > > > layers of persistence speed. The whole persistent memory without > any > > > > > commercially known future is a bit weird at the moment (even Optane > > has > > > > no > > > > > known manufacturing anymore with last factory being dismantled > based > > on > > > > > public information). > > > > > > > > > > > boolean streamToMemtable() > > > > > > > > > > And that one I don't understand. Why is streaming in the memtable? > > This > > > > > smells like a scope creep from something else. The explanation > would > > > > > indicate to me that the wanted behavior is just disabling automated > > > > > flushing. > > > > > > > > > > But these are just some questions that came to my mind while > reading > > > > this. > > > > > And I don't want to sound too negative (most of the features are > > really > > > > > something I'd like to see), perhaps I just misunderstood some of > the > > > > > motivations why stuff should be brought to memtable instead of > being > > > > > implemented outside memtable. Perhaps there's something else in the > > > write > > > > > pipeline arch that needs fixing but is now masqueraded inside this > > CEP. > > > > > > > > > > I'm definitely interested to hear more. > > > > > > > > > > - Micke > > > > > > > > > > On Wed, 21 Jul 2021 at 08:24, Berenguer Blasi < > > > berenguerbl...@gmail.com> > > > > > wrote: > > > > > > > > > > > +1. De-tangling, going more modular and clean interfaces sgtm. > > > > > > > > > > > > On 20/7/21 21:45, Nate McCall wrote: > > > > > > > Yay for pluggable memtables!! I havent gone over this in detail > > > yet, > > > > > but > > > > > > > personally I've always thought integrating something like Arrow > > > would > > > > > be > > > > > > > cool for sharing data (that's as far as i've gotten, but > anything > > > > that > > > > > > > makes that kind of experimentation easier would also help with > > > > mocking > > > > > > test > > > > > > > plumbing, so +1 from me). > > > > > > > > > > > > > > Thanks for putting this together! > > > > > > > > > > > > > > -Nate > > > > > > > > > > > > > > On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov < > > > > > > > branimir.lam...@datastax.com> wrote: > > > > > > > > > > > > > >> Proposal for a mechanism for plugging in memtable > > implementations: > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations > > > > > > >> > > > > > > >> The proposal supports using custom memtable implementations to > > > > support > > > > > > >> development and testing of improved alternatives, but also > > > enables a > > > > > > >> broader definition of "memtable" to better support more > advanced > > > use > > > > > > cases > > > > > > >> like persistent memory. To this end, memtable implementations > > are > > > > > given > > > > > > >> control over flushing and storing data in the commit log, > > enabling > > > > > > >> solutions that implement their own durability mechanisms and > > live > > > > much > > > > > > >> longer than their classical counterparts. Taken to the > extreme, > > > this > > > > > > also > > > > > > >> enables memtables that never flush (in other words, > alternative > > > > > storage > > > > > > >> engines) in a minimally-invasive manner. > > > > > > >> > > > > > > >> I am curious to hear your thoughts on the proposal. > > > > > > >> > > > > > > >> Regards, > > > > > > >> Branimir > > > > > > >> > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Branimir Lambov > > > > e. branimir.lam...@datastax.com > > > > w. www.datastax.com<http://www.datastax.com> > > > > > > > > > > > > > -- > > > Branimir Lambov > > > e. branimir.lam...@datastax.com > > > w. www.datastax.com > > > > > > > > -- > Branimir Lambov > e. branimir.lam...@datastax.com > w. www.datastax.com >