Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Benjamin Lerer Tue, 17 Aug 2021 01:52:53 -0700

Are there still some concerns with the CEP or should we start the vote?


Le ven. 23 juil. 2021 à 15:37, Branimir Lambov <branimir.lam...@datastax.com>
a écrit :

> > CEP indicates the flushing behavior is suddenly more tied to the Memtable
>   implementation level rather than being configurable at the table level
>
> The specific things that change with the proposal are:
> - Flushes are supplied with a reason (e.g. memory full, schema change,
> prepare
>   to stream).
> - The memtable can reject a flush request.
> - The logic to initiate "memory full" and "period expired" flushes moves to
> the
>   memtable where it conceptually belongs.
>
> Is the latter what worries you? For reusability, the current logic is
> extracted
> in a base class that the skiplist/trie/7282 implementations derive from.
>
>
> > I'm not sure if the "isDurable" + "shouldSkip" is interesting instead
>   of "shouldWrite"(etc). But I also wonder in cases where point-in-time
> restore
>   is required how one could achieve it without a commit log(can persistent
>   memory memtable be rolled back?).
>
> That's exactly the reason why the two flags are separate. To use PITR, you
> use
> the commit log but make sure that it does not treat the segments covered by
> the
> persistent memtable as dirty(i.e. writesAreDurable but not
> writesShouldSkipCommitLog); commit log segments are written only to be
> archived, and PITR restores a memtable snapshot and applies the mutations
> after
> it.
>
> Am I misunderstanding the question?
>
>
> > Although I do feel like persistent memory exceptions make stuff more
> complex.
>
> The persistent memtables were the reason that drove this functionality, but
> think about it also as an easy way to do pluggable storage engines. I may
> not
> be up to date with the consensus in the community on this, but I don't see
> us
> investing the effort to have fully-fledged pluggable storage engines of the
> CASSANDRA-13475 type any time soon.
>
> To make the memtable a storage engine you need two things:
> - an opt out of flushing, so that the memtable is the only component that
> serves
>   reads,
> - an opt out of the commit log, so that the memtable is the only component
> that
>   serves writes,
>
> plus some solutions for the secondary uses of sstables (streaming) and
> commit
> log (PITR, CDC).
>
> The proposal gives it that, with a little more control than just opt-out.
> It can
> work for the pmem (opt out of both) and rocksdb (opt out of flushing only)
> use cases, but for me it will also be useful to experiment with a memtable
> that
> includes its own version of a commit log (opt out of commit log only).
>
>
> On Thu, Jul 22, 2021 at 4:00 PM Michael Burman <y...@iki.fi> wrote:
>
> > On Wed, 21 Jul 2021 at 17:24, Branimir Lambov <
> > branimir.lam...@datastax.com>
> > wrote:
> >
> > > > Why is flushing control bad to do in CFS and better in the
> > >   memtable?
> > >
> > > I wonder why you would understand this as something that takes away
> > > control instead of giving it. The CFS is not configurable. With the
> > > CEP, memtables are configurable at the table level. It is entirely
> > > possible to implement a memtable wrapper that provides any of the
> > > examples of functionalities you mention -- and that would be fully
> > > configurable (just as example, one could very well select a
> > > time-series-optimized-flush wrapper over skip-list memtable).
> > >
> > >
> > I think this was a bit of miscommunication. I'm not in favor of keeping
> it
> > in the CFS, but at least to me (as a reader) CEP indicates the flushing
> > behavior is suddenly more tied to the Memtable implementation level
> rather
> > than being configurable at the table level. Thus that would not reduce
> > coupling of different flush strategies, but instead just move it from CFS
> > to Memtable-implementation. And especially with multiple Memtable
> > implementations that would mean the reusable parts of flushing could end
> up
> > being difficult to reuse. If not the intention, then good.
> >
> >
> > >
> > > This is another question that the proposal leaves to the memtable
> > > implementation (or wrapper), but it does make sense to make sure the
> > > interfaces provide the necessary support for sharding
> > >
> >
> > + 1 to this, that's a good limitation of scope to get forward. I think
> this
> > was originally touched in 7282 (where I had it in the memtable impl), but
> > then got pushed one step outside.
> >
> > writesShouldSkipCommitLog is a result of scope reduction (call it
> > > laziness on my part). I could not find a way to tell if commit log
> > > data may be required for point-in-time-restore or any other feature,
> > > and the existing method of turning the commit log off does not have
> > > the right granularity. I am very open to suggestions here.
> > >
> >
> > Could this be limited to a single parameter? I'm not sure if the
> > "isDurable" + "shouldSkip" is interesting instead of "shouldWrite" (etc).
> > But I also wonder in cases where point-in-time restore is required how
> one
> > could achieve it without a commit log (can persistent memory memtable be
> > rolled back?). That does have an effect on backups. I have to read your
> > impl how you intended to rewrite the process from Keyspace (where the
> > requirement for "isDurable" starts from).
> >
> > Although I do feel like persistent memory exceptions make stuff more
> > complex.
> >
> >
> >
> > >
> > >
> > >
> > > > Why is streaming in the memtable? [...] the wanted behavior is just
> > >   disabling automated flushing
> > >
> > > Yes, if zero-copy-streaming is not enabled. And that's exactly what
> > > this method is there for -- to make sure sstables are not copied
> > > whole, and that a flush is not done at the end.
> > >
> > > Regards,
> > > Branimir
> > >
> > > On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org <
> bened...@apache.org
> > >
> > > wrote:
> > >
> > > > I would love to help out with this in any way that I can, FYI.
> > Definitely
> > > > one of the more impactful performance improvements to the codebase,
> > given
> > > > the benefits to compaction and memory behaviour.
> > > >
> > > > From: bened...@apache.org <bened...@apache.org>
> > > > Date: Wednesday, 21 July 2021 at 14:32
> > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > > memtable-as-a-commitlog-index
> > > >
> > > > Heh, based on 7282? Yeah, I’ve had this idea for a while now
> (actually
> > > > there was a paper that did this a long time ago), and it could be
> very
> > > nice
> > > > (if for no other benefit than reducing heap utilisation). I don’t
> think
> > > > this requires that they be modelled as the same concept, however,
> only
> > > that
> > > > the Memtable must be able to receive an address into a commit log
> entry
> > > and
> > > > to adopt partial ownership over the entry’s lifecycle.
> > > >
> > > >
> > > > From: Branimir Lambov <branimir.lam...@datastax.com>
> > > > Date: Wednesday, 21 July 2021 at 14:28
> > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > > In general, I think we need to make up our mind as to whether we
> > > >   consider the Memtable and CommitLog one logical entity [...], or
> > > >   whether we want to further untangle those two components from an
> > > >   architectural perspective which we started down that road on with
> > > >   the pluggable storage engine work.
> > > >
> > > > This CEP is intentionally not attempting to answer this question.
> FWIW
> > > > I do not see them as separable (there's evidence to this fact in the
> > > > codebase), but there are valid secondary uses of the commit log that
> > > > are served well enough by the current architecture.
> > > >
> > > > It is important, however, to let the memtable implementation opt out,
> > > > to permit it to provide its own solution for data persistence.
> > > >
> > > > We should revisit this in the future, especially if Benedict's shared
> > > > log facility and my plans for a memtable-as-a-commitlog-index
> > > > evolve.
> > > >
> > > > Regards,
> > > > Branimir
> > > >
> > > > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman <y...@iki.fi> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > It is nice to see these going forward (and a great use of CEP) so
> > > thanks
> > > > > for the proposal. I have my reservations regarding the linking of
> > > > memtable
> > > > > to CommitLog and flushing and should not leak abstraction from one
> to
> > > > > another. And I don't see the reasoning why they should be, it
> doesn't
> > > > seem
> > > > > to add anything else than tight coupling of components, reducing
> > reuse
> > > > and
> > > > > making things unnecessarily complicated. Also, the streaming
> notions
> > > seem
> > > > > weird to me - how are they related to memtable? Why should memtable
> > > care
> > > > > about the behavior outside memtable's responsibility?
> > > > >
> > > > > Some misc (with some thoughts split / duplicated to different
> parts)
> > > > quotes
> > > > > and comments:
> > > > >
> > > > > > Tight coupling between CFS and memtable will be reduced: flushing
> > > > > functionality is to be extracted, controlling memtable memory and
> > > period
> > > > > expiration will be handled by the memtable.
> > > > >
> > > > > Why is flushing control bad to do in CFS and better in the
> memtable?
> > > > Doing
> > > > > it outside memtable would allow to control the flushing regardless
> of
> > > how
> > > > > the actual memtable is implemented. For example, lets say someone
> > would
> > > > > want to implement the HBase's accordion to Cassandra. It shouldn't
> > > matter
> > > > > what the implementation of memtable is as the compaction of
> different
> > > > > memtables could be beneficial to all implementations. Or the
> flushing
> > > > would
> > > > > push the memtable to a proper caching instead of only to disk.
> > > > >
> > > > > Or if we had per table caching structure, we could control the
> > flushing
> > > > of
> > > > > memtables and the cache structure separately. Some data benefits
> from
> > > LRU
> > > > > and some from MRW (most-recently-written) caching strategies. But
> > both
> > > > > could benefit from the same memtable implementation, it's the data
> > and
> > > > how
> > > > > its used that could control how the flushing should work. For
> example
> > > > time
> > > > > series data behaves quite differently in terms of data accesses to
> > > > > something more "random".
> > > > >
> > > > > Or even "total memory control" which would check which tables need
> > more
> > > > > memory to do their writes and which do not. Or that the memory
> > doesn't
> > > > grow
> > > > > over a boundary and needs to manually maintain how much is
> dedicated
> > to
> > > > > caching and how much to memtables waiting to be flushed. Or delay
> > > > flushing
> > > > > because the disks can't keep up etc. Not to be implemented in this
> > CEP,
> > > > but
> > > > > pushing this strategy to memtable would prevent many features.
> > > > >
> > > > > > Beyond thread-safety, the concurrency constraints of the memtable
> > are
> > > > > intentionally left unspecified.
> > > > >
> > > > > I like this. I could see use-cases where a single-thread
> > implementation
> > > > > could actually outperform some concurrent data structures. But it
> > also
> > > > > provides me with a question, is this proposal going to take an
> angle
> > > > > towards per-range memtables? There are certainly benefits to
> > splitting
> > > > the
> > > > > memtables as it would reduce the "n" in the operations, thus
> > providing
> > > > less
> > > > > overhead in lookups and writes. Although, taking it one step
> > backwards
> > > I
> > > > > could see the benefit of having a commitlog per range also, which
> > would
> > > > > allow higher utilization of NVME drives with larger queue depths.
> And
> > > why
> > > > > not per-range-sstables for faster scale-outs and .. a bit outside
> the
> > > > scope
> > > > > of CEP, but just to ensure that the implementation does not block
> > such
> > > > > improvement.
> > > > >
> > > > > Interfaces:
> > > > >
> > > > > > boolean writesAreDurable()
> > > > > > boolean writesShouldSkipCommitLog()
> > > > >
> > > > > The placement inside memtable implementation for these methods just
> > > feels
> > > > > incredibly wrong to me. The writing pipeline should have these
> > > configured
> > > > > and they could differ for each table even with the same memtable
> > > > > implementation. Lets take the example of an in-memory memtable use
> > case
> > > > > that's never written to a SSTable. We could have one table with
> just
> > > > simply
> > > > > in-memory cached storage and another one with a Redis style
> > persistence
> > > > of
> > > > > AOF, where writes would be written to the commitlog for fast
> > recovery,
> > > > but
> > > > > the data is otherwise always only kept in the memtable instead of
> > > writing
> > > > > to the SSTable (for performance reasons). Same implementation of
> > > memtable
> > > > > still.
> > > > >
> > > > > Why would the write process of the table not ask the table what
> > > settings
> > > > it
> > > > > has and instead asks the memtable what settings the table has? This
> > > seems
> > > > > counterintuitive to me. Even the persistent memory case is a bit
> > > > > questionable, why not simply disable commitlog in the writing
> > process?
> > > > Why
> > > > > ask the memtable?
> > > > >
> > > > > This feels like memtable is going to be the write pipeline, but to
> me
> > > > that
> > > > > doesn't feel like the correct architectural decision. I'd rather
> see
> > > > these
> > > > > decisions done outside the memtable. Even a persistent memory
> > memtable
> > > > user
> > > > > might want to have a commitlog enabled for data capture / shipping
> > > logs,
> > > > or
> > > > > layers of persistence speed. The whole persistent memory without
> any
> > > > > commercially known future is a bit weird at the moment (even Optane
> > has
> > > > no
> > > > > known manufacturing anymore with last factory being dismantled
> based
> > on
> > > > > public information).
> > > > >
> > > > > > boolean streamToMemtable()
> > > > >
> > > > > And that one I don't understand. Why is streaming in the memtable?
> > This
> > > > > smells like a scope creep from something else. The explanation
> would
> > > > > indicate to me that the wanted behavior is just disabling automated
> > > > > flushing.
> > > > >
> > > > > But these are just some questions that came to my mind while
> reading
> > > > this.
> > > > > And I don't want to sound too negative (most of the features are
> > really
> > > > > something I'd like to see), perhaps I just misunderstood some of
> the
> > > > > motivations why stuff should be brought to memtable instead of
> being
> > > > > implemented outside memtable. Perhaps there's something else in the
> > > write
> > > > > pipeline arch that needs fixing but is now masqueraded inside this
> > CEP.
> > > > >
> > > > > I'm definitely interested to hear more.
> > > > >
> > > > >   - Micke
> > > > >
> > > > > On Wed, 21 Jul 2021 at 08:24, Berenguer Blasi <
> > > berenguerbl...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1. De-tangling, going more modular and clean interfaces sgtm.
> > > > > >
> > > > > > On 20/7/21 21:45, Nate McCall wrote:
> > > > > > > Yay for pluggable memtables!! I havent gone over this in detail
> > > yet,
> > > > > but
> > > > > > > personally I've always thought integrating something like Arrow
> > > would
> > > > > be
> > > > > > > cool for sharing data (that's as far as i've gotten, but
> anything
> > > > that
> > > > > > > makes that kind of experimentation easier would also help with
> > > > mocking
> > > > > > test
> > > > > > > plumbing, so +1 from me).
> > > > > > >
> > > > > > > Thanks for putting this together!
> > > > > > >
> > > > > > > -Nate
> > > > > > >
> > > > > > > On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> > > > > > > branimir.lam...@datastax.com> wrote:
> > > > > > >
> > > > > > >> Proposal for a mechanism for plugging in memtable
> > implementations:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
> > > > > > >>
> > > > > > >> The proposal supports using custom memtable implementations to
> > > > support
> > > > > > >> development and testing of improved alternatives, but also
> > > enables a
> > > > > > >> broader definition of "memtable" to better support more
> advanced
> > > use
> > > > > > cases
> > > > > > >> like persistent memory. To this end, memtable implementations
> > are
> > > > > given
> > > > > > >> control over flushing and storing data in the commit log,
> > enabling
> > > > > > >> solutions that implement their own durability mechanisms and
> > live
> > > > much
> > > > > > >> longer than their classical counterparts. Taken to the
> extreme,
> > > this
> > > > > > also
> > > > > > >> enables memtables that never flush (in other words,
> alternative
> > > > > storage
> > > > > > >> engines) in a minimally-invasive manner.
> > > > > > >>
> > > > > > >> I am curious to hear your thoughts on the proposal.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Branimir
> > > > > > >>
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Branimir Lambov
> > > > e. branimir.lam...@datastax.com
> > > > w. www.datastax.com<http://www.datastax.com>
> > > >
> > >
> > >
> > > --
> > > Branimir Lambov
> > > e. branimir.lam...@datastax.com
> > > w. www.datastax.com
> > >
> >
>
>
> --
> Branimir Lambov
> e. branimir.lam...@datastax.com
> w. www.datastax.com
>

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Reply via email to